Uwe Sauter
2018-05-09 09:51:53 UTC
Hi,
since kernel 4.15.x was released in pve-nosubscription I have I/O performance
regressions that lead to 100% iowait in VMs, dropped (audit) log records and
instability in general.
All VMs that present this behavior run up-to-date CentOS 7 on Ceph-backed storage
with kvm64 as CPU.
This behavior presents itself if one or more hosts are running kernel 4.15.x
(I tried 4.15.15 and 4.15.17) which lets me conclude that this must be related
to a combination of this kernel and Ceph (and not to the Meltdown/Spectre
patches that are included in those kernels).
Once all hosts are booted back into running kernel 4.13.16 the situation
calms down almost immediately and VMs go back to running with low-percentage iowait.
VM kernels have not been changed in the two weeks since 4.15.x was released.
I played around with the "PCID" cpu flag for the VMs but cannot say if this had
any positive or negative effect on the issue.
Does anyone else see this behavior?
Any suggestions on further debugging?
Thanks,
Uwe
####### hardware ########
4x dual-socket Xeon E5-2670 (Sandybridge), 64GB RAM, 3 Ceph OSD disks
2x dual-socket Xeon E5606 (Westmere), 96GB RAM, 6 Ceph OSD disks
10GbE connection between all hosts
#########################
######### pveversion -v #########
proxmox-ve: 5.1-43 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.13.16-2-pve: 4.13.16-48
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9
#################################
since kernel 4.15.x was released in pve-nosubscription I have I/O performance
regressions that lead to 100% iowait in VMs, dropped (audit) log records and
instability in general.
All VMs that present this behavior run up-to-date CentOS 7 on Ceph-backed storage
with kvm64 as CPU.
This behavior presents itself if one or more hosts are running kernel 4.15.x
(I tried 4.15.15 and 4.15.17) which lets me conclude that this must be related
to a combination of this kernel and Ceph (and not to the Meltdown/Spectre
patches that are included in those kernels).
Once all hosts are booted back into running kernel 4.13.16 the situation
calms down almost immediately and VMs go back to running with low-percentage iowait.
VM kernels have not been changed in the two weeks since 4.15.x was released.
I played around with the "PCID" cpu flag for the VMs but cannot say if this had
any positive or negative effect on the issue.
Does anyone else see this behavior?
Any suggestions on further debugging?
Thanks,
Uwe
####### hardware ########
4x dual-socket Xeon E5-2670 (Sandybridge), 64GB RAM, 3 Ceph OSD disks
2x dual-socket Xeon E5606 (Westmere), 96GB RAM, 6 Ceph OSD disks
10GbE connection between all hosts
#########################
######### pveversion -v #########
proxmox-ve: 5.1-43 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.13: 5.1-44
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
pve-kernel-4.13.16-2-pve: 4.13.16-48
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-21
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9
#################################