Uwe Sauter
2018-05-02 20:27:39 UTC
Hi all,
I updated my cluster this morning (version info see end of mail) and rebooted all hosts sequentially, live migrating VMs
between hosts. (Six hosts connected via 10GbE, all participating in a Ceph cluster.)
Since then I experience hanging storage tasks inside the VMs (e.g. jbd2 on VMs with Ext4 or xfsaid on VMs with XFS).
This goes so far that auditd fills the dmesg buffer with messages like:
[14109.375608] audit_log_start: 23 callbacks suppressed
[14109.376496] audit: audit_backlog=70 > audit_backlog_limit=64
[14109.377213] audit: audit_lost=2274 audit_rate_limit=0 audit_backlog_limit=64
[14109.377954] audit: backlog limit exceeded
Performance is massively reduced on those VMs. The VMs all run up-to-date CentOS 7.4 with Qemu Guest service running.
This happens also if the VM was shutdown and started again.
Does anyone else see this happen? Any thought on the cause, any proposal for a fix?
Thanks,
Uwe
# pveversion -v
proxmox-ve: 5.1-43 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-3
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
ceph: 12.2.4-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-19
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-26
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9
I updated my cluster this morning (version info see end of mail) and rebooted all hosts sequentially, live migrating VMs
between hosts. (Six hosts connected via 10GbE, all participating in a Ceph cluster.)
Since then I experience hanging storage tasks inside the VMs (e.g. jbd2 on VMs with Ext4 or xfsaid on VMs with XFS).
This goes so far that auditd fills the dmesg buffer with messages like:
[14109.375608] audit_log_start: 23 callbacks suppressed
[14109.376496] audit: audit_backlog=70 > audit_backlog_limit=64
[14109.377213] audit: audit_lost=2274 audit_rate_limit=0 audit_backlog_limit=64
[14109.377954] audit: backlog limit exceeded
Performance is massively reduced on those VMs. The VMs all run up-to-date CentOS 7.4 with Qemu Guest service running.
This happens also if the VM was shutdown and started again.
Does anyone else see this happen? Any thought on the cause, any proposal for a fix?
Thanks,
Uwe
# pveversion -v
proxmox-ve: 5.1-43 (running kernel: 4.15.15-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-3
pve-kernel-4.15.15-1-pve: 4.15.15-6
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
ceph: 12.2.4-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-19
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-26
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9