[PVE-User] Proxmox Kernel / Ceph Integration

Discussion:

Marcus Haarmann

2018-07-27 09:02:05 UTC

Hi experts,

we are using a Proxmox cluster with an underlying ceph storage.
Versions are pve 5.2-2 with kernel 4.15.18-1-pve and ceph luminous 12.2.5
We are running a couple of VM and also Containers there.
3 virtual NIC (as bond balance-alb), ceph uses 2 bonded 10GBit interfaces (public/cluster separated)

It occurs during nightly backup that backup stalls. In parallel, we get lots of messages in the dmesg:
[137612.371311] libceph: mon0 192.168.16.31:6789 session established
[137643.090541] libceph: mon0 192.168.16.31:6789 session lost, hunting for new mon
[137643.091383] libceph: mon1 192.168.16.32:6789 session established
[137673.810526] libceph: mon1 192.168.16.32:6789 session lost, hunting for new mon
[137673.811388] libceph: mon2 192.168.16.34:6789 session established
[137704.530567] libceph: mon2 192.168.16.34:6789 session lost, hunting for new mon
[137704.531363] libceph: mon0 192.168.16.31:6789 session established
[137735.250593] libceph: mon0 192.168.16.31:6789 session lost, hunting for new mon
[137735.251352] libceph: mon1 192.168.16.32:6789 session established
[137765.970608] libceph: mon1 192.168.16.32:6789 session lost, hunting for new mon
[137765.971544] libceph: mon0 192.168.16.31:6789 session established
[137796.690605] libceph: mon0 192.168.16.31:6789 session lost, hunting for new mon
[137796.691412] libceph: mon1 192.168.16.32:6789 session established

We are searching for the issue for a while, since the blocking backup is not easy to overcome (unblocking does not help,
only stop and migrate to a different server, since the rbd device seems to block).
It seems to be related to the ceph messages.
We found the following patch related to these messages (which may lead to a blocking state in kernel):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b4c443d139f1d2b5570da475f7a9cbcef86740c

We have tried to patch the kernel ourselfes, but this was not successful.

Although I presume the real error situation is related to a network problem, it would be nice to have an
official backport of this patch in the pve kernel.
Anybody can do that ? (only one line of code)

We are trying to modify the bonding mode because the network connection seems to be unstable,
maybe this solves the issue.

Thank you very much and best regards,

Marcus Haarmann

Thomas Lamprecht

2018-07-27 09:12:24 UTC

Permalink

Hi,

Post by Marcus Haarmann
Hi experts,
we are using a Proxmox cluster with an underlying ceph storage.
Versions are pve 5.2-2 with kernel 4.15.18-1-pve and ceph luminous 12.2.5
We are running a couple of VM and also Containers there.
3 virtual NIC (as bond balance-alb), ceph uses 2 bonded 10GBit interfaces (public/cluster separated)
[137612.371311] libceph: mon0 192.168.16.31:6789 session established
[137643.090541] libceph: mon0 192.168.16.31:6789 session lost, hunting for new mon
[137643.091383] libceph: mon1 192.168.16.32:6789 session established
[137673.810526] libceph: mon1 192.168.16.32:6789 session lost, hunting for new mon
[137673.811388] libceph: mon2 192.168.16.34:6789 session established
[137704.530567] libceph: mon2 192.168.16.34:6789 session lost, hunting for new mon
[137704.531363] libceph: mon0 192.168.16.31:6789 session established
[137735.250593] libceph: mon0 192.168.16.31:6789 session lost, hunting for new mon
[137735.251352] libceph: mon1 192.168.16.32:6789 session established
[137765.970608] libceph: mon1 192.168.16.32:6789 session lost, hunting for new mon
[137765.971544] libceph: mon0 192.168.16.31:6789 session established
[137796.690605] libceph: mon0 192.168.16.31:6789 session lost, hunting for new mon
[137796.691412] libceph: mon1 192.168.16.32:6789 session established
We are searching for the issue for a while, since the blocking backup is not easy to overcome (unblocking does not help,
only stop and migrate to a different server, since the rbd device seems to block).
It seems to be related to the ceph messages.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b4c443d139f1d2b5570da475f7a9cbcef86740c
We have tried to patch the kernel ourselfes, but this was not successful.

Porting the patch was not successful or the patch did not worked as
expected?

Post by Marcus Haarmann
Although I presume the real error situation is related to a network problem, it would be nice to have an
official backport of this patch in the pve kernel.
Anybody can do that ? (only one line of code)

A single line can also wreak havoc just fine ;-)
But this one seems/sounds harmless, regression-wise. But it would be
really good to first know if the patch addresses the issue at all.

Post by Marcus Haarmann
We are trying to modify the bonding mode because the network connection seems to be unstable,
maybe this solves the issue.

Sounds like it's worth a shot, if you already know that the network may
not be fully stable, as you may want to do something about that sooner
or later anyway.

cheers,
Thomas

Marcus Haarmann

2018-07-27 10:05:16 UTC

Permalink

Hi,

we tried to build the kernel and the message vanished, yes.
However, the error situation was more unstable than before, so we went back to an official version of the kernel,
because the build process was "rough" for us ... (we do not do this kind of things very often).
Since this patch is officially backported in several kernels (obviously from a ceph team member who knows what he was doing),
but not in 4.15, I would presume this should not make things worse.
The situation which we get here seems to be the same as descibed: a mon connection is lost for some reason and the kernel
ceph client seems to be stuck in a kind of endless loop, because of two identical timers.
That is what the patch addresses.

But you are right, we should solve the issue why the mon connection is lost (this is during a backup -> high I/O on network,
which might cause the mon connection loss).
Next step is to modify the bond to be just a failover instead of balance-alb (more conservative ...)

Marcus Haarmann

Von: "Thomas Lamprecht" <***@proxmox.com>
An: "pve-user" <pve-***@pve.proxmox.com>, "Marcus Haarmann" <***@midoco.de>
Gesendet: Freitag, 27. Juli 2018 11:12:24
Betreff: Re: [PVE-User] Proxmox Kernel / Ceph Integration

Hi,

Porting the patch was not successful or the patch did not worked as
expected?

A single line can also wreak havoc just fine ;-)
But this one seems/sounds harmless, regression-wise. But it would be
really good to first know if the patch addresses the issue at all.

Post by Marcus Haarmann
We are trying to modify the bonding mode because the network connection seems to be unstable,
maybe this solves the issue.

Sounds like it's worth a shot, if you already know that the network may
not be fully stable, as you may want to do something about that sooner
or later anyway.

cheers,
Thomas

Adam Thompson

2018-07-27 13:24:37 UTC

Permalink

I have a thought, but need to know which network subnets are attached to
which bondX interfaces.
Also, you mention you have 3 "virtual NIC" in ALB mode. Is this a
V-in-V situation?
What bonding mode are you using for the two 10GE interfaces you dedicate
to CEPH?
(Feel free to just paste /etc/network/interfaces if that's easier than
typing it all out - just make notes about which i/f does what.)
-Adam

Marcus Haarmann

2018-07-27 15:42:13 UTC

Permalink

Hi Adam,

here is the setup:

auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

iface eth2 inet manual

iface eth3 inet manual

iface eth4 inet manual

iface eth5 inet manual

auto bond0
iface bond0 inet manual
slaves eth0 eth1
bond_miimon 100
bond_mode balance-alb
#frontside

auto bond1
iface bond1 inet static
address 192.168.16.31
netmask 255.255.255.0
slaves eth2 eth3
bond_miimon 100
bond_mode balance-alb
pre-up (ifconfig eth2 mtu 8996 && ifconfig eth3 mtu 8996)
mtu 8996
#corosync

auto bond2
iface bond2 inet static
address 192.168.17.31
netmask 255.255.255.0
slaves eth4 eth5
bond_miimon 100
bond_mode balance-alb
pre-up (ifconfig eth4 mtu 8996 && ifconfig eth5 mtu 8996)
mtu 8996
#ceph

auto vmbr0
iface vmbr0 inet static
address 192.168.19.31
netmask 255.255.255.0
gateway 192.168.19.1
bridge_ports bond0
bridge_stp off
bridge_fd 0

bond0/vmbr0 is for using the VMs (frontend side)
bond1 is ceph public net
bond2 is ceph cluster net
corosync is running in 192.168.16.x

Marcus Haarmann

Von: "Adam Thompson" <***@athompso.net>
An: "pve-user" <pve-***@pve.proxmox.com>
CC: "Marcus Haarmann" <***@midoco.de>
Gesendet: Freitag, 27. Juli 2018 15:24:37
Betreff: Re: [PVE-User] Proxmox Kernel / Ceph Integration