[PVE-User] Confusing about Bond 802.3ad

iperf will open a single stream by default. An 802.3ad bond will only send a specific stream across a single link in the lag.

Try the -P flag to add parallel streams with iperf and see how the performance looks.

Post by Gilberto Nunes
Hello
I have a T-Link switch T1600G-28TS and wanna use 4 NIC, all 1GB, to create
a LAG...
I have success to create it, indeed, however, when use iperf3 to check the
performance, found that the traffic do not transpass 1GB/s???
This 802.3ad do no suppose to agrengate the speed of all available NIC??
In the switch, I also have VLAN ( no trunk) which are used to communicate
only with the clusters and storages...
iperf3 --bind 10.10.10.100 -s
iperf3 -c 10.10.10.100
Need same advice.
Thanks a lot.
---
Gilberto Nunes Ferreira
(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram
Skype: gilberto.nunes36
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Dietmar Maurer

2018-08-24 06:01:35 UTC

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??

No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

Gilberto Nunes

2018-08-24 10:01:28 UTC

So what bond mode I suppose to use in order to get more speed? I mean how
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.

Thanks

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??

No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

Uwe Sauter

2018-08-24 10:45:11 UTC

If using standard 802.3ad (LACP) you will always get only the performance of a single link between one host and another.

Using "bond-xmit-hash-policy layer3+4" might get you a better performance but is not standard LACP.

Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean how
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??

No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

2018-08-24 10:52:10 UTC

Hi,

Yes, it is our undertanding that if the hardware (switch) supports it,
"bond-xmit-hash-policy layer3+4" gives you best spread.

But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.

That is how we understand it.

So the load balance is now based on Layer4 instead of L3.

Besides these details, I think what you are doing should work nicely.

MJ

If using standard 802.3ad (LACP) you will always get only the performance of a single link between one host and another.
Using "bond-xmit-hash-policy layer3+4" might get you a better performance but is not standard LACP.

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??

No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Josh Knight

2018-08-24 15:02:04 UTC

Depending on your topology/configuration, you could try to use bond-rr mode
in Linux instead of 802.3ad.

Bond-rr mode is the only mode that will put pkts for the same mac/ip/port
tuple across multiple interfaces. This will work well for UDP but TCP may
suffer performance issues because pkts can end up out of order and trigger
TCP retransmits. There are some examples on this page, you may need to do
some testing before deploying it to ensure it does what you want.

https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options

As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without using
round robin mode.

Post by mj
Hi,
Yes, it is our undertanding that if the hardware (switch) supports it,
"bond-xmit-hash-policy layer3+4" gives you best spread.
But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.
That is how we understand it.

So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ

If using standard 802.3ad (LACP) you will always get only the

performance of a single link between one host and another.

Using "bond-xmit-hash-policy layer3+4" might get you a better

performance but is not standard LACP.

Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean

how

Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available

NIC??

Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Gilberto Nunes

2018-08-24 15:19:50 UTC

So I try balance-rr with LAG in the switch and still get 1 GB

pve-ceph02:~# iperf3 -c 10.10.10.100
Connecting to host 10.10.10.100, port 5201
[ 4] local 10.10.10.110 port 52674 connected to 10.10.10.100 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 116 MBytes 974 Mbits/sec 32 670
KBytes
[ 4] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 3 597
KBytes
[ 4] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 3 509
KBytes
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 660
KBytes
[ 4] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 6 585
KBytes
[ 4] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 0 720
KBytes
[ 4] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 3 650
KBytes
[ 4] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 4 570
KBytes
[ 4] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0 708
KBytes
[ 4] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 8 635
KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec 59 sender
[ 4] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec
receiver

iperf Done.

---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36

Post by Josh Knight
Depending on your topology/configuration, you could try to use bond-rr mode
in Linux instead of 802.3ad.
Bond-rr mode is the only mode that will put pkts for the same mac/ip/port
tuple across multiple interfaces. This will work well for UDP but TCP may
suffer performance issues because pkts can end up out of order and trigger
TCP retransmits. There are some examples on this page, you may need to do
some testing before deploying it to ensure it does what you want.
https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options
As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without using
round robin mode.

So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ

If using standard 802.3ad (LACP) you will always get only the

performance of a single link between one host and another.

Using "bond-xmit-hash-policy layer3+4" might get you a better

performance but is not standard LACP.

Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean

how

Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available

NIC??

Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Josh Knight

2018-08-24 15:57:48 UTC

I don't know your topology, I'm assuming you're going from nodeA ->
switch -> nodeB ? Make sure that entire path is using RR. You could
verify this with interface counters on the various hops. If a single hop
is not doing it correctly, it will limit the throughput.

Post by Gilberto Nunes
So I try balance-rr with LAG in the switch and still get 1 GB
pve-ceph02:~# iperf3 -c 10.10.10.100
Connecting to host 10.10.10.100, port 5201
[ 4] local 10.10.10.110 port 52674 connected to 10.10.10.100 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 116 MBytes 974 Mbits/sec 32 670
KBytes
[ 4] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 3 597
KBytes
[ 4] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 3 509
KBytes
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 660
KBytes
[ 4] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 6 585
KBytes
[ 4] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 0 720
KBytes
[ 4] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 3 650
KBytes
[ 4] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 4 570
KBytes
[ 4] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0 708
KBytes
[ 4] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 8 635
KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec 59
sender
[ 4] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec
receiver
iperf Done.
---
Gilberto Nunes Ferreira
(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram
Skype: gilberto.nunes36

Post by Josh Knight
Depending on your topology/configuration, you could try to use bond-rr

mode

Post by Josh Knight
in Linux instead of 802.3ad.
Bond-rr mode is the only mode that will put pkts for the same mac/ip/port
tuple across multiple interfaces. This will work well for UDP but TCP

may

Post by Josh Knight
suffer performance issues because pkts can end up out of order and

trigger

Post by Josh Knight
TCP retransmits. There are some examples on this page, you may need to

Post by Josh Knight
some testing before deploying it to ensure it does what you want.

https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options

Post by Josh Knight
As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without

using

Post by Josh Knight
round robin mode.

So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ

If using standard 802.3ad (LACP) you will always get only the

performance of a single link between one host and another.

Using "bond-xmit-hash-policy layer3+4" might get you a better

performance but is not standard LACP.

Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I

mean

Post by mj
how

Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available

NIC??

Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Mark Adams

2018-08-24 16:20:19 UTC

also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this depends on
switch model and configuration. so safest idea is remove it.

so I think you have 3 nodes....

for example:

node1:
ens0 on port 1 vlan 10
ens1 on port 4 vlan 11
ens2 on port 7 vlan 12
ens3 on port 10 vlan 13

node2:
ens0 on port 2 vlan 10
ens1 on port 5 vlan 11
ens2 on port 8 vlan 12
ens3 on port 11 vlan 13

node3:
ens0 on port 3 vlan 10
ens1 on port 6 vlan 11
ens2 on port 9 vlan 12
ens3 on port 12 vlan 13

then I belive your iperf test will return ~3Gbps... i seem to remember
performance doesnt get much better than this but I cant remember why.

Also can't say if this is a good setup for ceph performance..

Cheers

Post by Josh Knight
I don't know your topology, I'm assuming you're going from nodeA ->
switch -> nodeB ? Make sure that entire path is using RR. You could
verify this with interface counters on the various hops. If a single hop
is not doing it correctly, it will limit the throughput.
On Fri, Aug 24, 2018 at 11:20 AM Gilberto Nunes <

Post by Josh Knight
Depending on your topology/configuration, you could try to use bond-rr

mode

Post by Josh Knight
in Linux instead of 802.3ad.
Bond-rr mode is the only mode that will put pkts for the same

mac/ip/port

Post by Josh Knight
tuple across multiple interfaces. This will work well for UDP but TCP

may

Post by Josh Knight
suffer performance issues because pkts can end up out of order and

trigger

Post by Josh Knight
TCP retransmits. There are some examples on this page, you may need to

Post by Josh Knight
some testing before deploying it to ensure it does what you want.

https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options

Post by Josh Knight
As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without

using

Post by Josh Knight
round robin mode.

Post by mj
Hi,
Yes, it is our undertanding that if the hardware (switch) supports

it,

Post by mj
"bond-xmit-hash-policy layer3+4" gives you best spread.
But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.
That is how we understand it.
You can also try something on the switch, like we did on our

Procurve chassis(config)# show trunk
Load Balancing Method: L3-based (default)
Port | Name Type | Group Type
---- + -------------------------------- --------- + ------

--------

So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ

If using standard 802.3ad (LACP) you will always get only the

performance of a single link between one host and another.

Using "bond-xmit-hash-policy layer3+4" might get you a better

performance but is not standard LACP.

Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I

mean

Post by mj
how

Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks

Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all

available

Post by mj
NIC??

Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start

Post by Dietmar Maurer
parallel connections you can gain more speed.

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

2018-08-24 20:36:25 UTC

Hi Mark,

Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this depends on
switch model and configuration. so safest idea is remove it.

<snip>

Post by Mark Adams
then I belive your iperf test will return ~3Gbps... i seem to remember
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..

Gilberto Nunes

2018-08-24 20:59:49 UTC

I can get 3 gbps. At least 1.3 gbps.
Don't know why!

Post by mj
Hi Mark,

<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember

Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..

This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?
Strange that performence turns out to be ~3Gbps, instead of the expected
4...
Anyone with more information on this subject?
Have a nice weekend all!
MJ
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Josh Knight

2018-08-24 21:15:15 UTC

Just guessing here, if the switch doesn't support rr on its port channels,
then using separate VLANs instead of bundles on the switch is essentially
wiring nodeA to nodeB. That way you don't hit the port channel hashing on
the switch and you keep the rr as-is from A to B.

I would also try using UDP mode on iperf to see if it's TCP retransmission
that's preventing you from getting closer to 4Gbps. Another useful tool is
maisezahn for traffic generation, though it is more complex to run.

Post by Gilberto Nunes
I can get 3 gbps. At least 1.3 gbps.
Don't know why!

Post by mj
Hi Mark,

<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember

Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Josh Knight

2018-08-24 21:16:30 UTC

sorry, should say "mausezahn". It's a part of Netsniff
http://netsniff-ng.org/

Post by Josh Knight
Just guessing here, if the switch doesn't support rr on its port channels,
then using separate VLANs instead of bundles on the switch is essentially
wiring nodeA to nodeB. That way you don't hit the port channel hashing on
the switch and you keep the rr as-is from A to B.
I would also try using UDP mode on iperf to see if it's TCP retransmission
that's preventing you from getting closer to 4Gbps. Another useful tool is
maisezahn for traffic generation, though it is more complex to run.

Post by Gilberto Nunes
I can get 3 gbps. At least 1.3 gbps.
Don't know why!

Post by mj
Hi Mark,

Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this

depends

Post by Mark Adams
on
switch model and configuration. so safest idea is remove it.

<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember

Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..

_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Mark Adams

2018-08-24 21:58:36 UTC

That is it, as I understand it Josh. you basically need to turn your switch
in to X seperate switches so each nodes nic, is running on a "seperate"
network.

if you were to do the same thing physically without any config, with 3
nodes, you would need to have as many seperate switches as you wanted nics
in the balance-rr.

I understand mikrotik support balance-rr, but tbh I don't even count them
as a normal switch manufacturer. Their game is routers.... I don't know any
other switches which have support for balance-rr?

as for the 3Gbps limit I mentioned earlier with balance-rr (no matter how
many nics you have)... I don't know if that was just an issue of the day as
cheap 10Gbps came along and the need evaporated for me. I would love to
know if anyone has a test setup to try it though.

Cheers

Post by Gilberto Nunes
I can get 3 gbps. At least 1.3 gbps.
Don't know why!

Post by mj
Hi Mark,

Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a

seperate

Post by Mark Adams
vlan. You probably need to remove your lacp config also but this

depends

Post by Mark Adams
on
switch model and configuration. so safest idea is remove it.

<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember

Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..

This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?
Strange that performence turns out to be ~3Gbps, instead of the

expected