Serialization and serialization times in 40G/10G and 100G/25G Ethernet

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite
1












I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.



System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.



A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).



For a 1518 byte frame with 12'144bits,
at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs


Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.



To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?



Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.



Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.



Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?



Is it someting else entirely, such as...



Thanks for your comments and pointers.










share|improve this question



























    up vote
    3
    down vote

    favorite
    1












    I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.



    System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.



    A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).



    For a 1518 byte frame with 12'144bits,
    at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
    at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
    at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
    at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs


    Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.



    To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?



    Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.



    Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.



    Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?



    Is it someting else entirely, such as...



    Thanks for your comments and pointers.










    share|improve this question

























      up vote
      3
      down vote

      favorite
      1









      up vote
      3
      down vote

      favorite
      1






      1





      I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.



      System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.



      A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).



      For a 1518 byte frame with 12'144bits,
      at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
      at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
      at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
      at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs


      Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.



      To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?



      Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.



      Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.



      Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?



      Is it someting else entirely, such as...



      Thanks for your comments and pointers.










      share|improve this question















      I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.



      System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.



      A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).



      For a 1518 byte frame with 12'144bits,
      at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
      at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
      at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
      at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs


      Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.



      To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?



      Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.



      Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.



      Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?



      Is it someting else entirely, such as...



      Thanks for your comments and pointers.







      latency serialization rtt 40g 100g






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago

























      asked 2 hours ago









      Marc 'netztier' Luethi

      1,982313




      1,982313




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote













          This is overthinking.



          The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.



          Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.



          If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.



          Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.






          share|improve this answer






















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "496"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fnetworkengineering.stackexchange.com%2fquestions%2f53688%2fserialization-and-serialization-times-in-40g-10g-and-100g-25g-ethernet%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            3
            down vote













            This is overthinking.



            The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.



            Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.



            If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.



            Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.






            share|improve this answer


























              up vote
              3
              down vote













              This is overthinking.



              The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.



              Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.



              If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.



              Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.






              share|improve this answer
























                up vote
                3
                down vote










                up vote
                3
                down vote









                This is overthinking.



                The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.



                Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.



                If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.



                Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.






                share|improve this answer














                This is overthinking.



                The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.



                Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.



                If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.



                Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 27 mins ago









                jonathanjo

                6,170423




                6,170423










                answered 59 mins ago









                Zac67

                20.5k21047




                20.5k21047



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fnetworkengineering.stackexchange.com%2fquestions%2f53688%2fserialization-and-serialization-times-in-40g-10g-and-100g-25g-ethernet%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What does second last employer means? [closed]

                    List of Gilmore Girls characters

                    One-line joke