Discussion:
[PVE-User] Confusing about Bond 802.3ad
Gilberto Nunes
2018-08-24 01:33:39 UTC
Permalink
Hello

I have a T-Link switch T1600G-28TS and wanna use 4 NIC, all 1GB, to create
a LAG...
I have success to create it, indeed, however, when use iperf3 to check the
performance, found that the traffic do not transpass 1GB/s???
This 802.3ad do no suppose to agrengate the speed of all available NIC??

In the switch, I also have VLAN ( no trunk) which are used to communicate
only with the clusters and storages...

The iperf3 performance was like this:
Server:
iperf3 --bind 10.10.10.100 -s
Client:
iperf3 -c 10.10.10.100

Need same advice.

Thanks a lot.
---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36
Chance Ellis
2018-08-24 02:54:19 UTC
Permalink
iperf will open a single stream by default. An 802.3ad bond will only send a specific stream across a single link in the lag.

Try the -P flag to add parallel streams with iperf and see how the performance looks.
Post by Gilberto Nunes
Hello
I have a T-Link switch T1600G-28TS and wanna use 4 NIC, all 1GB, to create
a LAG...
I have success to create it, indeed, however, when use iperf3 to check the
performance, found that the traffic do not transpass 1GB/s???
This 802.3ad do no suppose to agrengate the speed of all available NIC??
In the switch, I also have VLAN ( no trunk) which are used to communicate
only with the clusters and storages...
iperf3 --bind 10.10.10.100 -s
iperf3 -c 10.10.10.100
Need same advice.
Thanks a lot.
---
Gilberto Nunes Ferreira
(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram
Skype: gilberto.nunes36
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Dietmar Maurer
2018-08-24 06:01:35 UTC
Permalink
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
Gilberto Nunes
2018-08-24 10:01:28 UTC
Permalink
So what bond mode I suppose to use in order to get more speed? I mean how
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.

Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
Uwe Sauter
2018-08-24 10:45:11 UTC
Permalink
If using standard 802.3ad (LACP) you will always get only the performance of a single link between one host and another.

Using "bond-xmit-hash-policy layer3+4" might get you a better performance but is not standard LACP.
Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean how
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
mj
2018-08-24 10:52:10 UTC
Permalink
Hi,

Yes, it is our undertanding that if the hardware (switch) supports it,
"bond-xmit-hash-policy layer3+4" gives you best spread.

But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.

That is how we understand it.
Procurve chassis(config)# show trunk
Load Balancing Method: L3-based (default)
Port | Name Type | Group Type
---- + -------------------------------- --------- + ------ --------
D1 | Link to prn004 - 1 10GbE-T | Trk1 LACP
D2 | Link to prn004 - 2 10GbE-T | Trk1 LACP
D3 | Link to prn005 - 1 10GbE-T | Trk2 LACP
D4 | Link to prn005 - 2 10GbE-T | Trk2 LACP
Procurve chassis(config)# trunk-load-balance L4
So the load balance is now based on Layer4 instead of L3.

Besides these details, I think what you are doing should work nicely.

MJ
If using standard 802.3ad (LACP) you will always get only the performance of a single link between one host and another.
Using "bond-xmit-hash-policy layer3+4" might get you a better performance but is not standard LACP.
Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean how
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available NIC??
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Josh Knight
2018-08-24 15:02:04 UTC
Permalink
Depending on your topology/configuration, you could try to use bond-rr mode
in Linux instead of 802.3ad.

Bond-rr mode is the only mode that will put pkts for the same mac/ip/port
tuple across multiple interfaces. This will work well for UDP but TCP may
suffer performance issues because pkts can end up out of order and trigger
TCP retransmits. There are some examples on this page, you may need to do
some testing before deploying it to ensure it does what you want.

https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options

As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without using
round robin mode.
Post by mj
Hi,
Yes, it is our undertanding that if the hardware (switch) supports it,
"bond-xmit-hash-policy layer3+4" gives you best spread.
But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.
That is how we understand it.
Procurve chassis(config)# show trunk
Load Balancing Method: L3-based (default)
Port | Name Type | Group Type
---- + -------------------------------- --------- + ------ --------
D1 | Link to prn004 - 1 10GbE-T | Trk1 LACP
D2 | Link to prn004 - 2 10GbE-T | Trk1 LACP
D3 | Link to prn005 - 1 10GbE-T | Trk2 LACP
D4 | Link to prn005 - 2 10GbE-T | Trk2 LACP
Procurve chassis(config)# trunk-load-balance L4
So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ
If using standard 802.3ad (LACP) you will always get only the
performance of a single link between one host and another.
Using "bond-xmit-hash-policy layer3+4" might get you a better
performance but is not standard LACP.
Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean
how
Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available
NIC??
Post by Gilberto Nunes
Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Gilberto Nunes
2018-08-24 15:19:50 UTC
Permalink
So I try balance-rr with LAG in the switch and still get 1 GB

pve-ceph02:~# iperf3 -c 10.10.10.100
Connecting to host 10.10.10.100, port 5201
[ 4] local 10.10.10.110 port 52674 connected to 10.10.10.100 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 116 MBytes 974 Mbits/sec 32 670
KBytes
[ 4] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 3 597
KBytes
[ 4] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 3 509
KBytes
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 660
KBytes
[ 4] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 6 585
KBytes
[ 4] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 0 720
KBytes
[ 4] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 3 650
KBytes
[ 4] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 4 570
KBytes
[ 4] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0 708
KBytes
[ 4] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 8 635
KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec 59 sender
[ 4] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec
receiver

iperf Done.



---
Gilberto Nunes Ferreira

(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram

Skype: gilberto.nunes36
Post by Josh Knight
Depending on your topology/configuration, you could try to use bond-rr mode
in Linux instead of 802.3ad.
Bond-rr mode is the only mode that will put pkts for the same mac/ip/port
tuple across multiple interfaces. This will work well for UDP but TCP may
suffer performance issues because pkts can end up out of order and trigger
TCP retransmits. There are some examples on this page, you may need to do
some testing before deploying it to ensure it does what you want.
https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options
As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without using
round robin mode.
Post by mj
Hi,
Yes, it is our undertanding that if the hardware (switch) supports it,
"bond-xmit-hash-policy layer3+4" gives you best spread.
But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.
That is how we understand it.
Procurve chassis(config)# show trunk
Load Balancing Method: L3-based (default)
Port | Name Type | Group Type
---- + -------------------------------- --------- + ------ --------
D1 | Link to prn004 - 1 10GbE-T | Trk1 LACP
D2 | Link to prn004 - 2 10GbE-T | Trk1 LACP
D3 | Link to prn005 - 1 10GbE-T | Trk2 LACP
D4 | Link to prn005 - 2 10GbE-T | Trk2 LACP
Procurve chassis(config)# trunk-load-balance L4
So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ
If using standard 802.3ad (LACP) you will always get only the
performance of a single link between one host and another.
Using "bond-xmit-hash-policy layer3+4" might get you a better
performance but is not standard LACP.
Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I mean
how
Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available
NIC??
Post by Gilberto Nunes
Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Josh Knight
2018-08-24 15:57:48 UTC
Permalink
I don't know your topology, I'm assuming you're going from nodeA ->
switch -> nodeB ? Make sure that entire path is using RR. You could
verify this with interface counters on the various hops. If a single hop
is not doing it correctly, it will limit the throughput.
Post by Gilberto Nunes
So I try balance-rr with LAG in the switch and still get 1 GB
pve-ceph02:~# iperf3 -c 10.10.10.100
Connecting to host 10.10.10.100, port 5201
[ 4] local 10.10.10.110 port 52674 connected to 10.10.10.100 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 116 MBytes 974 Mbits/sec 32 670
KBytes
[ 4] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 3 597
KBytes
[ 4] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 3 509
KBytes
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 660
KBytes
[ 4] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 6 585
KBytes
[ 4] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 0 720
KBytes
[ 4] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 3 650
KBytes
[ 4] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 4 570
KBytes
[ 4] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0 708
KBytes
[ 4] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 8 635
KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec 59
sender
[ 4] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec
receiver
iperf Done.
---
Gilberto Nunes Ferreira
(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram
Skype: gilberto.nunes36
Post by Josh Knight
Depending on your topology/configuration, you could try to use bond-rr
mode
Post by Josh Knight
in Linux instead of 802.3ad.
Bond-rr mode is the only mode that will put pkts for the same mac/ip/port
tuple across multiple interfaces. This will work well for UDP but TCP
may
Post by Josh Knight
suffer performance issues because pkts can end up out of order and
trigger
Post by Josh Knight
TCP retransmits. There are some examples on this page, you may need to
do
Post by Josh Knight
some testing before deploying it to ensure it does what you want.
https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options
Post by Josh Knight
As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without
using
Post by Josh Knight
round robin mode.
Post by mj
Hi,
Yes, it is our undertanding that if the hardware (switch) supports it,
"bond-xmit-hash-policy layer3+4" gives you best spread.
But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.
That is how we understand it.
Procurve chassis(config)# show trunk
Load Balancing Method: L3-based (default)
Port | Name Type | Group Type
---- + -------------------------------- --------- + ------ --------
D1 | Link to prn004 - 1 10GbE-T | Trk1 LACP
D2 | Link to prn004 - 2 10GbE-T | Trk1 LACP
D3 | Link to prn005 - 1 10GbE-T | Trk2 LACP
D4 | Link to prn005 - 2 10GbE-T | Trk2 LACP
Procurve chassis(config)# trunk-load-balance L4
So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ
If using standard 802.3ad (LACP) you will always get only the
performance of a single link between one host and another.
Using "bond-xmit-hash-policy layer3+4" might get you a better
performance but is not standard LACP.
Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I
mean
Post by Josh Knight
Post by mj
how
Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all available
NIC??
Post by Gilberto Nunes
Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start more
parallel connections you can gain more speed.
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Mark Adams
2018-08-24 16:20:19 UTC
Permalink
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this depends on
switch model and configuration. so safest idea is remove it.

so I think you have 3 nodes....

for example:

node1:
ens0 on port 1 vlan 10
ens1 on port 4 vlan 11
ens2 on port 7 vlan 12
ens3 on port 10 vlan 13

node2:
ens0 on port 2 vlan 10
ens1 on port 5 vlan 11
ens2 on port 8 vlan 12
ens3 on port 11 vlan 13

node3:
ens0 on port 3 vlan 10
ens1 on port 6 vlan 11
ens2 on port 9 vlan 12
ens3 on port 12 vlan 13

then I belive your iperf test will return ~3Gbps... i seem to remember
performance doesnt get much better than this but I cant remember why.

Also can't say if this is a good setup for ceph performance..

Cheers
Post by Josh Knight
I don't know your topology, I'm assuming you're going from nodeA ->
switch -> nodeB ? Make sure that entire path is using RR. You could
verify this with interface counters on the various hops. If a single hop
is not doing it correctly, it will limit the throughput.
On Fri, Aug 24, 2018 at 11:20 AM Gilberto Nunes <
Post by Gilberto Nunes
So I try balance-rr with LAG in the switch and still get 1 GB
pve-ceph02:~# iperf3 -c 10.10.10.100
Connecting to host 10.10.10.100, port 5201
[ 4] local 10.10.10.110 port 52674 connected to 10.10.10.100 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 116 MBytes 974 Mbits/sec 32 670
KBytes
[ 4] 1.00-2.00 sec 112 MBytes 941 Mbits/sec 3 597
KBytes
[ 4] 2.00-3.00 sec 112 MBytes 941 Mbits/sec 3 509
KBytes
[ 4] 3.00-4.00 sec 112 MBytes 941 Mbits/sec 0 660
KBytes
[ 4] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 6 585
KBytes
[ 4] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 0 720
KBytes
[ 4] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 3 650
KBytes
[ 4] 7.00-8.00 sec 112 MBytes 941 Mbits/sec 4 570
KBytes
[ 4] 8.00-9.00 sec 112 MBytes 941 Mbits/sec 0 708
KBytes
[ 4] 9.00-10.00 sec 112 MBytes 941 Mbits/sec 8 635
KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.10 GBytes 945 Mbits/sec 59
sender
[ 4] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec
receiver
iperf Done.
---
Gilberto Nunes Ferreira
(47) 3025-5907
(47) 99676-7530 - Whatsapp / Telegram
Skype: gilberto.nunes36
Post by Josh Knight
Depending on your topology/configuration, you could try to use bond-rr
mode
Post by Josh Knight
in Linux instead of 802.3ad.
Bond-rr mode is the only mode that will put pkts for the same
mac/ip/port
Post by Gilberto Nunes
Post by Josh Knight
tuple across multiple interfaces. This will work well for UDP but TCP
may
Post by Josh Knight
suffer performance issues because pkts can end up out of order and
trigger
Post by Josh Knight
TCP retransmits. There are some examples on this page, you may need to
do
Post by Josh Knight
some testing before deploying it to ensure it does what you want.
https://wiki.linuxfoundation.org/networking/bonding#bonding-driver-options
Post by Gilberto Nunes
Post by Josh Knight
As others have stated, you can adjust the hashing, but a single flow
(mac/ip/port combination) will still end up limited to 1Gbps without
using
Post by Josh Knight
round robin mode.
Post by mj
Hi,
Yes, it is our undertanding that if the hardware (switch) supports
it,
Post by Gilberto Nunes
Post by Josh Knight
Post by mj
"bond-xmit-hash-policy layer3+4" gives you best spread.
But it will still give you 4 'lanes' of 1GB. Ceph will connect using
different ports, ip's etc, en each connection should use a different
lane, so altogether, you should see a network throughput that
(theoretically) could be as high as 4GB.
That is how we understand it.
You can also try something on the switch, like we did on our
Procurve chassis(config)# show trunk
Load Balancing Method: L3-based (default)
Port | Name Type | Group Type
---- + -------------------------------- --------- + ------
--------
Post by Gilberto Nunes
Post by Josh Knight
Post by mj
D1 | Link to prn004 - 1 10GbE-T | Trk1 LACP
D2 | Link to prn004 - 2 10GbE-T | Trk1 LACP
D3 | Link to prn005 - 1 10GbE-T | Trk2 LACP
D4 | Link to prn005 - 2 10GbE-T | Trk2 LACP
Procurve chassis(config)# trunk-load-balance L4
So the load balance is now based on Layer4 instead of L3.
Besides these details, I think what you are doing should work nicely.
MJ
If using standard 802.3ad (LACP) you will always get only the
performance of a single link between one host and another.
Using "bond-xmit-hash-policy layer3+4" might get you a better
performance but is not standard LACP.
Post by Gilberto Nunes
So what bond mode I suppose to use in order to get more speed? I
mean
Post by Josh Knight
Post by mj
how
Post by Gilberto Nunes
to join the nic to get 4 GB? I will use Ceph!
I know I should use 10gb but I dont have it right now.
Thanks
Post by Dietmar Maurer
Post by Gilberto Nunes
This 802.3ad do no suppose to agrengate the speed of all
available
Post by Gilberto Nunes
Post by Josh Knight
Post by mj
NIC??
Post by Gilberto Nunes
Post by Dietmar Maurer
No, not really. One connection is limited to 1GB. If you start
more
Post by Gilberto Nunes
Post by Josh Knight
Post by mj
Post by Gilberto Nunes
Post by Dietmar Maurer
parallel connections you can gain more speed.
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
mj
2018-08-24 20:36:25 UTC
Permalink
Hi Mark,
Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this depends on
switch model and configuration. so safest idea is remove it.
<snip>
Post by Mark Adams
then I belive your iperf test will return ~3Gbps... i seem to remember
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..
This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?

Strange that performence turns out to be ~3Gbps, instead of the expected
4...

Anyone with more information on this subject?

Have a nice weekend all!

MJ
Gilberto Nunes
2018-08-24 20:59:49 UTC
Permalink
I can get 3 gbps. At least 1.3 gbps.
Don't know why!
Post by mj
Hi Mark,
Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this depends on
switch model and configuration. so safest idea is remove it.
<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember
Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..
This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?
Strange that performence turns out to be ~3Gbps, instead of the expected
4...
Anyone with more information on this subject?
Have a nice weekend all!
MJ
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Josh Knight
2018-08-24 21:15:15 UTC
Permalink
Just guessing here, if the switch doesn't support rr on its port channels,
then using separate VLANs instead of bundles on the switch is essentially
wiring nodeA to nodeB. That way you don't hit the port channel hashing on
the switch and you keep the rr as-is from A to B.

I would also try using UDP mode on iperf to see if it's TCP retransmission
that's preventing you from getting closer to 4Gbps. Another useful tool is
maisezahn for traffic generation, though it is more complex to run.
Post by Gilberto Nunes
I can get 3 gbps. At least 1.3 gbps.
Don't know why!
Post by mj
Hi Mark,
Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this depends on
switch model and configuration. so safest idea is remove it.
<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember
Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..
This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?
Strange that performence turns out to be ~3Gbps, instead of the expected
4...
Anyone with more information on this subject?
Have a nice weekend all!
MJ
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Josh Knight
2018-08-24 21:16:30 UTC
Permalink
sorry, should say "mausezahn". It's a part of Netsniff
http://netsniff-ng.org/
Post by Josh Knight
Just guessing here, if the switch doesn't support rr on its port channels,
then using separate VLANs instead of bundles on the switch is essentially
wiring nodeA to nodeB. That way you don't hit the port channel hashing on
the switch and you keep the rr as-is from A to B.
I would also try using UDP mode on iperf to see if it's TCP retransmission
that's preventing you from getting closer to 4Gbps. Another useful tool is
maisezahn for traffic generation, though it is more complex to run.
Post by Gilberto Nunes
I can get 3 gbps. At least 1.3 gbps.
Don't know why!
Post by mj
Hi Mark,
Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a seperate
vlan. You probably need to remove your lacp config also but this
depends
Post by mj
Post by Mark Adams
on
switch model and configuration. so safest idea is remove it.
<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember
Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..
This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?
Strange that performence turns out to be ~3Gbps, instead of the expected
4...
Anyone with more information on this subject?
Have a nice weekend all!
MJ
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Mark Adams
2018-08-24 21:58:36 UTC
Permalink
That is it, as I understand it Josh. you basically need to turn your switch
in to X seperate switches so each nodes nic, is running on a "seperate"
network.

if you were to do the same thing physically without any config, with 3
nodes, you would need to have as many seperate switches as you wanted nics
in the balance-rr.

I understand mikrotik support balance-rr, but tbh I don't even count them
as a normal switch manufacturer. Their game is routers.... I don't know any
other switches which have support for balance-rr?

as for the 3Gbps limit I mentioned earlier with balance-rr (no matter how
many nics you have)... I don't know if that was just an issue of the day as
cheap 10Gbps came along and the need evaporated for me. I would love to
know if anyone has a test setup to try it though.

Cheers
Post by Josh Knight
Just guessing here, if the switch doesn't support rr on its port channels,
then using separate VLANs instead of bundles on the switch is essentially
wiring nodeA to nodeB. That way you don't hit the port channel hashing on
the switch and you keep the rr as-is from A to B.
I would also try using UDP mode on iperf to see if it's TCP retransmission
that's preventing you from getting closer to 4Gbps. Another useful tool is
maisezahn for traffic generation, though it is more complex to run.
Post by Gilberto Nunes
I can get 3 gbps. At least 1.3 gbps.
Don't know why!
Post by mj
Hi Mark,
Post by Mark Adams
also, balance-rr through a switch requires each nic to be on a
seperate
Post by Gilberto Nunes
Post by mj
Post by Mark Adams
vlan. You probably need to remove your lacp config also but this
depends
Post by Gilberto Nunes
Post by mj
Post by Mark Adams
on
switch model and configuration. so safest idea is remove it.
<snip>
then I belive your iperf test will return ~3Gbps... i seem to remember
Post by Mark Adams
performance doesnt get much better than this but I cant remember why.
Also can't say if this is a good setup for ceph performance..
This is really interesting info, i did not know this. Someone has tried
this with ceph? Any experiences to share..?
Strange that performence turns out to be ~3Gbps, instead of the
expected
Post by Gilberto Nunes
Post by mj
4...
Anyone with more information on this subject?
Have a nice weekend all!
MJ
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Loading...