Discussion:
[PVE-User] Cluster network via directly connected interfaces?
Frank Thommen
2018-11-22 18:29:56 UTC
Permalink
Please excuse, if this is too basic, but after reading
https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the
cluster/corosync network could be built by directly connected network
interfaces. I.e not like this:

+-------+
| pve01 |----------+
+-------+ |
|
+-------+ +----------------+
| pve02 |-----| network switch |
+-------+ +----------------+
|
+-------+ |
| pve03 |----------+
+-------+


but like this:

+-------+
| pve01 |---+
+-------+ |
| |
+-------+ |
| pve02 | |
+-------+ |
| |
+-------+ |
| pve03 |---+
+-------+

(all connections 1Gbit, there are currently not plans to extend over
three nodes)

I can't see any drawback in that solution. It would remove one layer of
hardware dependency and potential spof (the switch). If we don't trust
the interfaces, we might be able to configure a second network with the
three remaining interfaces.

Is such a "direct-connection" topology feasible? Recommended? Strictly
not recommended?

I am currently just planning and thinking and there is no cluster (or
even a PROXMOX server) in place.

Cheers
frank
Mark Schouten
2018-11-22 18:34:04 UTC
Permalink
Other than limited throughput, I can’t think of a problem. But limited throughput might cause unforeseen situations.

Mark Schouten
Post by Frank Thommen
+-------+
| pve01 |----------+
+-------+ |
|
+-------+ +----------------+
| pve02 |-----| network switch |
+-------+ +----------------+
|
+-------+ |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+ |
| |
+-------+ |
| pve02 | |
+-------+ |
| |
+-------+ |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces.
Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended?
I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Frank Thommen
2018-11-22 18:37:29 UTC
Permalink
But the throughput would be higher when using a switch, would it? It's
still just 1Gbit

frank
Post by Mark Schouten
Other than limited throughput, I can’t think of a problem. But limited throughput might cause unforeseen situations.
Mark Schouten
Post by Frank Thommen
+-------+
| pve01 |----------+
+-------+ |
|
+-------+ +----------------+
| pve02 |-----| network switch |
+-------+ +----------------+
|
+-------+ |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+ |
| |
+-------+ |
| pve02 | |
+-------+ |
| |
+-------+ |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces.
Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended?
I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
--
Frank Thommen | HD-HuB / DKFZ Heidelberg
| ***@uni-heidelberg.de
| MMK: +49-6221-54-3637 (Mo-Mi, Fr)
| IPMB: +49-6221-54-5823 (Do)
Frank Thommen
2018-11-22 18:42:19 UTC
Permalink
What I /really/ meant was "but the throughput would /not/ be higher when
using a switch"...
But the throughput would be higher when using a switch, would it?  It's
still just 1Gbit
frank
Post by Mark Schouten
Other than limited throughput, I can’t think of a problem. But limited
throughput might cause unforeseen situations.
Mark Schouten
Op 22 nov. 2018 om 19:30 heeft Frank Thommen
Please excuse, if this is too basic, but after reading
https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the
cluster/corosync network could be built by directly connected network
+-------+
| pve01 |----------+
+-------+          |
                    |
+-------+     +----------------+
| pve02 |-----| network switch |
+-------+     +----------------+
                    |
+-------+          |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+   |
     |       |
+-------+   |
| pve02 |   |
+-------+   |
     |       |
+-------+   |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution.  It would remove one layer
of hardware dependency and potential spof (the switch).  If we don't
trust the interfaces, we might be able to configure a second network
with the three remaining interfaces.
Is such a "direct-connection" topology feasible?  Recommended?
Strictly not recommended?
I am currently just planning and thinking and there is no cluster (or
even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
--
Frank Thommen | HD-HuB / DKFZ Heidelberg
| ***@uni-heidelberg.de
| MMK: +49-6221-54-3637 (Mo-Mi, Fr)
| IPMB: +49-6221-54-5823 (Do)
Uwe Sauter
2018-11-22 18:51:14 UTC
Permalink
FYI:

I had such a thing working. What you need to keep in mind is that you should configure both interfaces per host on the
same (software) bridge and keep STP on… that way when you loose the link from node A to node B the traffic will be going
through node C.

+--------------------+
| |
| Node A br0 |
| / \ |
| eth0 eth1 |
+------/-----------\-+
/ \
+----/------+ +-----\----+
| eth1 | | eth0 |
| / | | \ |
| br0--eth0-----eth1--br0 |
| Node B | | Node C |
+-----------+ +----------+
What I /really/ meant was "but the throughput would /not/ be higher when using a switch"...
But the throughput would be higher when using a switch, would it?  It's still just 1Gbit
frank
Post by Mark Schouten
Other than limited throughput, I can’t think of a problem. But limited throughput might cause unforeseen situations.
Mark Schouten
Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if
+-------+
| pve01 |----------+
+-------+          |
                    |
+-------+     +----------------+
| pve02 |-----| network switch |
+-------+     +----------------+
                    |
+-------+          |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+   |
     |       |
+-------+   |
| pve02 |   |
+-------+   |
     |       |
+-------+   |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution.  It would remove one layer of hardware dependency and potential spof (the
switch).  If we don't trust the interfaces, we might be able to configure a second network with the three remaining
interfaces.
Is such a "direct-connection" topology feasible?  Recommended? Strictly not recommended?
I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Frank Thommen
2018-11-22 18:55:28 UTC
Permalink
Good point. Thanks a lot
frank
Post by Uwe Sauter
I had such a thing working. What you need to keep in mind is that you
should configure both interfaces per host on the same (software) bridge
and keep STP on… that way when you loose the link from node A to node B
the traffic will be going through node C.
+--------------------+
|                    |
| Node A   br0       |
|         /   \      |
|       eth0   eth1  |
+------/-----------\-+
      /             \
+----/------+  +-----\----+
|  eth1     |  |    eth0  |
|  /        |  |       \  |
| br0--eth0-----eth1--br0 |
|   Node B  |  |  Node C  |
+-----------+  +----------+
Post by Frank Thommen
What I /really/ meant was "but the throughput would /not/ be higher
when using a switch"...
Post by Frank Thommen
But the throughput would be higher when using a switch, would it?
It's still just 1Gbit
frank
Post by Mark Schouten
Other than limited throughput, I can’t think of a problem. But
limited throughput might cause unforeseen situations.
Mark Schouten
Op 22 nov. 2018 om 19:30 heeft Frank Thommen
Please excuse, if this is too basic, but after reading
https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if the
cluster/corosync network could be built by directly connected
+-------+
| pve01 |----------+
+-------+          |
                    |
+-------+     +----------------+
| pve02 |-----| network switch |
+-------+     +----------------+
                    |
+-------+          |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+   |
     |       |
+-------+   |
| pve02 |   |
+-------+   |
     |       |
+-------+   |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution.  It would remove one
layer of hardware dependency and potential spof (the switch).  If
we don't trust the interfaces, we might be able to configure a
second network with the three remaining interfaces.
Is such a "direct-connection" topology feasible?  Recommended?
Strictly not recommended?
I am currently just planning and thinking and there is no cluster
(or even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Uwe Sauter
2018-11-22 19:12:56 UTC
Permalink
And one other thing.

I don't think that multicast traffic is possible in this solution so you need to configure corosync to do unicast UDP.
Make this change after creating the cluster on the first node but before joining any other nodes. Easiest point in time
for that change.

/etc/pve/corosync.conf

totem {
[…]
config_version: +=1 ######### means: increment by one fore every change
transport: udpu
}

And, as you already mentioned, having such a setup won't scale. Three nodes is the only size where this is sensible to do.


Do you plan to use Ceph?
Good point.  Thanks a lot
frank
Post by Uwe Sauter
I had such a thing working. What you need to keep in mind is that you should configure both interfaces per host on the
same (software) bridge and keep STP on… that way when you loose the link from node A to node B the traffic will be
going through node C.
+--------------------+
|                    |
| Node A   br0       |
|         /   \      |
|       eth0   eth1  |
+------/-----------\-+
       /             \
+----/------+  +-----\----+
|  eth1     |  |    eth0  |
|  /        |  |       \  |
| br0--eth0-----eth1--br0 |
|   Node B  |  |  Node C  |
+-----------+  +----------+
What I /really/ meant was "but the throughput would /not/ be higher when using a switch"...
But the throughput would be higher when using a switch, would it? It's still just 1Gbit
frank
Post by Mark Schouten
Other than limited throughput, I can’t think of a problem. But limited throughput might cause unforeseen situations.
Mark Schouten
Please excuse, if this is too basic, but after reading https://pve.proxmox.com/wiki/Cluster_Manager I wondered, if
+-------+
| pve01 |----------+
+-------+          |
                    |
+-------+     +----------------+
| pve02 |-----| network switch |
+-------+     +----------------+
                    |
+-------+          |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+   |
     |       |
+-------+   |
| pve02 |   |
+-------+   |
     |       |
+-------+   |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution.  It would remove one layer of hardware dependency and potential spof
(the switch).  If we don't trust the interfaces, we might be able to configure a second network with the three
remaining interfaces.
Is such a "direct-connection" topology feasible?  Recommended? Strictly not recommended?
I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Thomas Lamprecht
2018-11-22 20:06:30 UTC
Permalink
 +-------+
 | pve01 |----------+
 +-------+          |
                    |
 +-------+     +----------------+
 | pve02 |-----| network switch |
 +-------+     +----------------+
                    |
 +-------+          |
 | pve03 |----------+
 +-------+
 +-------+
 | pve01 |---+
 +-------+   |
     |       |
 +-------+   |
 | pve02 |   |
 +-------+   |
     |       |
 +-------+   |
 | pve03 |---+
 +-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution.  It would remove one layer of hardware dependency and potential spof (the switch).  If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces.
Is such a "direct-connection" topology feasible?  Recommended? Strictly not recommended?
full mesh is certainly not bad. for cluster network (corosync) latency is the key,
bandwidth isn't really much needed. So this surely not bad.
We use also some 10g (or 40G, not sure) full mesh for a ceph cluster network - you
safe a not to cheap switch and get full bandwidth and good latency.
The limiting factor is that this gets quite complex for bigger clusters, but besides
that it doesn't really has any drawbacks for inter cluster connects, AFAICT.

For multicast you need to try, as Uwe said, I'm currently not sure, it could work as
Linux can route multicast just fine (mrouter) but I don't remember exactly anymore -
sorry.

But if you try it it'd be great if you report back. Else unicast is in those cluster
sizes always an option - you really shouldn't have a problem as long as you do not put
storage traffic together with corosync (cluster) on the same net (corosync gets to much
latency spikes then).
I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place.
Cheers
frank
Stefan M. Radman
2018-11-23 08:13:52 UTC
Permalink
You might want to have a look at
https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
This is what (I think) Thomas Lamprecht is referring to and it should also be usable for corosync.

The advantage of this configuration over the bridged solution used by Uwe Sauter is the zero convergence time of the topology.
A bridged solution using the standard Linux bridge might break your corosync ring for a long time (20-50 seconds) during STP state transitions.

Disclaimer: I have tried neither of the two solutions.

Cheers
Stefan
lists
2018-11-23 09:24:28 UTC
Permalink
Hi,
Post by Stefan M. Radman
You might want to have a look at
https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
We are running that config (method 2) and we have never noticed any
multicast issues.

MJ
Stefan M. Radman
2018-11-23 11:30:51 UTC
Permalink
Hi Ronny

That's the first time I hear of a routing protocol in the corosync context.
Doesn't that add a whole lot of complexity in the setup?
Would it work with corosync multicast?

Stefan
Personally if i was to try and experiment with something non-default I would try to use ospf+bfd either with bird or quagga.
-you get quick failovers due to bfd.
-you can equal cost multipath links to utillize multiple ports between servers.
-All links are active, so you do not have a "passive" link, as you have with STP
-and there is no needless duplication of data, so you do not get the 50% bandwith loss of a broadcast bond.
-you need to use corosync with targeted udp towards spesific loopback addresses.
-traffic goes shortest path. so allways towards the correct server.
- you can very easily expand beyond 3 nodes if you have enough ports. Or move the ospf domain onto a switch if needed. this also easily converts to a multiple switch config to maintain HA and no SPOF
Happy experimentation!
mvh
Ronny Aasen
Post by Frank Thommen
+-------+
| pve01 |----------+
+-------+ |
|
+-------+ +----------------+
| pve02 |-----| network switch |
+-------+ +----------------+
|
+-------+ |
| pve03 |----------+
+-------+
+-------+
| pve01 |---+
+-------+ |
| |
+-------+ |
| pve02 | |
+-------+ |
| |
+-------+ |
| pve03 |---+
+-------+
(all connections 1Gbit, there are currently not plans to extend over three nodes)
I can't see any drawback in that solution. It would remove one layer of hardware dependency and potential spof (the switch). If we don't trust the interfaces, we might be able to configure a second network with the three remaining interfaces.
Is such a "direct-connection" topology feasible? Recommended? Strictly not recommended?
I am currently just planning and thinking and there is no cluster (or even a PROXMOX server) in place.
Cheers
frank
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Loading...