Serialization and serialization times in 40G/10G and 100G/25G Ethernet
Clash Royale CLAN TAG#URR8PPP
up vote
3
down vote
favorite
I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.
System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.
A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).
For a 1518 byte frame with 12'144bits,
at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs
Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.
To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?
Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.
Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.
Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?
Is it someting else entirely, such as...
Thanks for your comments and pointers.
latency serialization rtt 40g 100g
add a comment |Â
up vote
3
down vote
favorite
I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.
System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.
A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).
For a 1518 byte frame with 12'144bits,
at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs
Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.
To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?
Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.
Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.
Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?
Is it someting else entirely, such as...
Thanks for your comments and pointers.
latency serialization rtt 40g 100g
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.
System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.
A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).
For a 1518 byte frame with 12'144bits,
at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs
Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.
To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?
Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.
Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.
Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?
Is it someting else entirely, such as...
Thanks for your comments and pointers.
latency serialization rtt 40g 100g
I've recently been involved in discussions about lowest-latency requirements for a Leaf/Spine (or CLOS) network to host an OpenStack platform.
System architects are striving for the lowest possible RTT for their transactions (block storage and future RDMA scenarios), and the claim was that 100G/25G offered greatly reduced serialization delays compared to 40G/10G. All persons involved are aware that there's a lot more factors in the end to end game (any of which can hurt or help RTT) than just the NICs and switch ports serialization delays. Still, the topic about serialization delays keeps popping up, as they are one thing that is difficult to optimize without jumping a possibly very costly technology gap.
A bit over-simplified (leaving out the encoding schemes), serialization time can be calculated as number-of-bits/bit rate, which lets us start at ~1.2μs for 10G (also see wiki.geant.org).
For a 1518 byte frame with 12'144bits,
at 10G (assuming 10*10^9 bits/s), this will give us ~1.2μs
at 25G (assuming 25*10^9 bits/s), this would be reduced to ~0.48μs
at 40G (assuming 40*10^9 bits/s), one might expect to see ~0.3μs
at 100G (assuming 100*10^9 bits/s), one might expect to see ~0.12μs
Now for the interesting bit. At the physical layer, 40G is commonly done as 4 lanes of 10G and 100G is done as 4 lanes of 25G. Depending on QSFP+ or QSFP28 variant, this is sometimes done with 4 pairs of fibre strands, sometimes it is split by lambdas on a single fibre pair, where the QSFP module does some xWDM on its own. I do know that there's specs for 1x 40G or or 2x 50G or even 1x 100G lanes, but let's leave those aside for the moment.
To estimate serialization delays in the context of multi-lane 40G or 100G, one needs to know how 100G and 40G NICs and switch ports actually "distribute the bits to the (set of) wire(s)", so to speak. What is being done here?
Is it a bit like Etherchannel/LAG? The NIC/switchports send frames of one "flow" (read: same hashing result of whatever hashing algorithm is used across which scope of the frame) across one given channel? In that case, we'd expect serialization delays like 10G and 25G, respectively. But essentially, that would make a 40G link just a LAG of 4x10G, reducing single flow throughput to 1x10G.
Is it something like bit-wise round-robin? Each bit is round-robin distributed across the 4 (sub)channels? That might actually result in lower serialization delays because of parallelization, but raises some questions about in-order-delivery.
Is it something like frame-wise round-robin? Entire ethernet frames (or other suitably sized chunks of bits) are sent over the 4 channels, distributed in round robin fashion?
Is it someting else entirely, such as...
Thanks for your comments and pointers.
latency serialization rtt 40g 100g
latency serialization rtt 40g 100g
edited 1 hour ago
asked 2 hours ago


Marc 'netztier' Luethi
1,982313
1,982313
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
This is overthinking.
The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.
Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.
If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.
Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
This is overthinking.
The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.
Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.
If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.
Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.
add a comment |Â
up vote
3
down vote
This is overthinking.
The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.
Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.
If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.
Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
This is overthinking.
The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.
Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.
If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.
Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.
This is overthinking.
The number of lanes used doesn't really matter. Whether you transport 50 Gbit/s over 1, 2, or 5 lanes, the serialization delay is 20 ps/bit. So, you'd get 5 bits every 100 ps, regardless of the lanes used.
Naturally, 100 Gbit/s has half the delay of 50 Gbit/s and so on, so the faster you serialize, the faster a frame is transmitted.
If you're interested in the internal serialization in the interface you'd need to look at the MII variant that is being used for the speed class. However, this serialization takes place on-the-fly or in parallel with the actual MDI serialization - it does take a minute amount of time but that's up to the actual piece of hardware and probably impossible to predict (something along 2-5 ps would be my guess for 100 Gbit/s). I wouldn't actually worry about this as there are much larger factors involved. 10 ps is the order of transmission latency you'd get from an additional 2 millimeters(!) of cable.
Using four lanes of 10 Gbit/s each for 40 Gbit/s is NOT the same as aggregating four 10 Gbit/s links. A 40 Gbit/s link - regardless of the number of lanes - can transport a single 40 Gbit/s stream which LAGged 10 Gbit/s links can't. Also, the serialization delay of 40G is only 1/4 that of 10G.
edited 27 mins ago
jonathanjo
6,170423
6,170423
answered 59 mins ago


Zac67
20.5k21047
20.5k21047
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fnetworkengineering.stackexchange.com%2fquestions%2f53688%2fserialization-and-serialization-times-in-40g-10g-and-100g-25g-ethernet%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password