Discussion:
[PVE-User] CPU soft lookup
Lars Wilke
2011-02-26 13:30:10 UTC
Permalink
Hi,

I am experiencing reproducible KVM VM crashes/hangs and once a lost network config, when doing backups
via vzdump on the Hypervisor. Most of the time the VM just got stuck and i have to shut it down via
qm stop. Note the problem only occurs with the VM which i am backing up and sometimes with this vm when
copying large files on the HV node. The VM serves NFS and some databases and has around 150GB of data
which need to get backed up everytime. The other 3 VMs never crashed but once in a while i find the same
warning in the logs. I guess the reason for them not crashing might be that they are considerably smaller
in terms of used disk space.

I found this bug report https://bugs.launchpad.net/ubuntu/+source/linux/+bug/579276
and it contains some links to reports from Red Hat.
I am not exactly sure if the proposed patches fix my problem but these fixes are all in newer kernel
branches. My question is now if it would be worth to try the 2.6.35 Kernel on the HV. But what about
the VMs, do i need a newer/patched kernel there too?

Feb 26 13:25:17 be01 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [swapper:0]
Feb 26 13:25:17 be01 kernel: CPU 2:
Feb 26 13:25:17 be01 kernel: Modules linked in: nfsd exportfs nfs_acl auth_rpcgss ipv6 xfrm_nalgo crypto_api act_police cls_fw cls_u32 sch_htb sch_hfsc sch_ingress sch_sfq xt_connlimit xt_realm iptable_raw xt_comment xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_iprange ipt_hashlimit ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_CLUSTERIP ipt_ah ipt_addrtype ip_nat_tftp ip_nat_snmp_basic ip_nat_sip ip_nat_pptp ip_nat_irc ip_nat_h323 ip_nat_ftp ip_nat_amanda ip_conntrack_tftp ip_conntrack_sip ip_conntrack_pptp ip_conntrack_netbios_ns ip_conntrack_irc ip_conntrack_h323 ip_conntrack_ftp ts_kmp ip_conntrack_amanda xt_tcpmss xt_pkttype xt_physdev bridge xt_NFQUEUE xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_DSCP xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat ip_nat ip_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables lockd sunrpc xfs
Feb 26 13:25:17 be01 kernel: dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy joydev virtio_blk virtio_balloon virtio_net i2c_piix4 virtio_pci i2c_core virtio_ring ide_cd serio_raw virtio pcspkr cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Feb 26 13:25:17 be01 kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.32.1.el5 #1
Feb 26 13:25:17 be01 kernel: RIP: 0010:[<ffffffff8006b36b>] [<ffffffff8006b36b>] default_idle+0x29/0x50
Feb 26 13:25:17 be01 kernel: RSP: 0018:ffff81021fc67ef0 EFLAGS: 00000246
Feb 26 13:25:17 be01 kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
Feb 26 13:25:17 be01 kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8030a718
Feb 26 13:25:17 be01 kernel: RBP: ffff81021fc1c270 R08: ffff81021fc66000 R09: 000000000000003e
Feb 26 13:25:17 be01 kernel: R10: ffff81021fcc0038 R11: 0000000000000000 R12: 00000000000fc133
Feb 26 13:25:17 be01 kernel: R13: 000022062c42fc61 R14: ffff8101639ff080 R15: ffff81021fc1c080
Feb 26 13:25:17 be01 kernel: FS: 0000000000000000(0000) GS:ffff81021fc1be40(0000) knlGS:0000000000000000
Feb 26 13:25:17 be01 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 26 13:25:17 be01 kernel: CR2: 00002b8470339000 CR3: 000000020ce1a000 CR4: 00000000000006e0
Feb 26 13:25:17 be01 kernel:
Feb 26 13:25:17 be01 kernel: Call Trace:
Feb 26 13:25:17 be01 kernel: [<ffffffff800492c4>] cpu_idle+0x95/0xb8
Feb 26 13:25:17 be01 kernel: [<ffffffff80077991>] start_secondary+0x498/0x4a7
Feb 26 13:25:17 be01 kernel:

The VMs are all CentOS 5.5.
On the HV nodes, there are 2 KVM VMs each which are more or less identical.
No OpenVZ is used, the two HV nodes share the storage via an LSI SAS HBA with
15K RPM disks. The VMs use the deadline IO scheduler and the HVs the default CFQ one.

# pveversion -v
pve-manager: 1.7-11 (pve-manager/1.7/5470)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

Debian Version: 5.0.8

VM configuration:
name: be01
ide2: none,media=cdrom
bootdisk: ide0
ostype: l26
ide0: kvm-share1:vm-102-disk-1,cache=none
memory: 8192
sockets: 2
onboot: 1
description:
cores: 2
vlan2: virtio=AA:A0:9F:11:67:E1
virtio0: kvm-share1:vm-102-disk-2,cache=none
boot: c
freeze: 0
cpuunits: 200000
acpi: 1
kvm: 1
vlan1: virtio=BE:30:52:BF:27:36
virtio1: data-share1:vm-102-disk-1,cache=none
virtio2: kvm-share1:vm-102-disk-3,cache=none
args: -balloon virtio

Backup is done like this
nice -n 14 vzdump --snapshot --size 2048 --compress --stdexcludes --ionice 7 --bwlimit 6148 --dumpdir /mnt 102

I tried first without nice, bwlimit and ionice this got me into trouble really fast.
Repeated experiments showed this to be usable values for the moment, but still sometimes i get the kernel warnings
shown above.

When copying large files i use this, else sometimes i have the same problem as when doing backups:
nice -n 14 cstream -i <input> -t 6148000 -o <output> &
ionice -c 2 -n 7 -p "$!"

Btw. how can i apply IO Limits to the VMs, i would like to limit the allowed network and disk resource usage.
Especially disk usage since shared storage is used.
IIUC i could use CGROUPs to limit block IO bandwidth and network usage.
Anybody here who would not mind to share his experiences with doing so?

thanks
--lars
Giovanni Toraldo
2011-02-26 14:33:27 UTC
Permalink
Post by Lars Wilke
Feb 26 13:25:17 be01 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [swapper:0]
Don't get tempted by the message itself, AFAIK it's a common behavior
inside a VM when the host is getting very high loads (ex: during
backups).

It would be a real problem if you got those message on the host
machine (where a CPU stuck can be a symptom of an hardware or firmware
issue).
--
Giovanni Toraldo
http://gionn.net/
Lars Wilke
2011-02-26 20:19:45 UTC
Permalink
Post by Giovanni Toraldo
Post by Lars Wilke
Feb 26 13:25:17 be01 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [swapper:0]
Don't get tempted by the message itself, AFAIK it's a common behavior
inside a VM when the host is getting very high loads (ex: during
backups).
Hm ok, this might explain why the message sometimes appears and the vm
runs still further. But then all of a sudden freezes and never comes back
to live again.
Post by Giovanni Toraldo
It would be a real problem if you got those message on the host
machine (where a CPU stuck can be a symptom of an hardware or firmware
issue).
Now that you mention it, there are two things which come to my mind. First
the problemativ VMs all have one or more VIRTIO HDDs and on the HV node i
saw this in the logs when the VM finally froze to death

Feb 26 02:45:59 s2 kernel: kvm D ffff8801e6e96000 0 5143 1 0x00000000
Feb 26 02:45:59 s2 kernel: ffff8801ee88c000 0000000000000082 0003520007f53df8 0000000000000000
Feb 26 02:45:59 s2 kernel: 0000000000000001 ffffffff81508580 000000000000fa40 ffff8801275fbfd8
Feb 26 02:45:59 s2 kernel: 0000000000016940 0000000000016940 ffff8801e6e96000 ffff8801e6e962f8
Feb 26 02:45:59 s2 kernel: Call Trace:
Feb 26 02:45:59 s2 kernel: [<ffffffff81247b1c>] ? dm_table_unplug_all+0x4b/0xb4
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff810b6d99>] ? sync_page+0x0/0x46
Feb 26 02:45:59 s2 kernel: [<ffffffff81313d95>] ? io_schedule+0x9b/0xfc
Feb 26 02:45:59 s2 kernel: [<ffffffff810b6dda>] ? sync_page+0x41/0x46
Feb 26 02:45:59 s2 kernel: [<ffffffff813142bf>] ? __wait_on_bit+0x41/0x70
Feb 26 02:45:59 s2 kernel: [<ffffffff810b6f5e>] ? wait_on_page_bit+0x6b/0x71
Feb 26 02:45:59 s2 kernel: [<ffffffff81066960>] ? wake_bit_function+0x0/0x23
Feb 26 02:45:59 s2 kernel: [<ffffffff810c0f2d>] ? shrink_page_list+0x14e/0x632
Feb 26 02:45:59 s2 kernel: [<ffffffff8105b8d4>] ? del_timer_sync+0xc/0x16
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff81314114>] ? schedule_timeout+0xad/0xdd
Feb 26 02:45:59 s2 kernel: [<ffffffff8106dd7f>] ? ktime_get_ts+0x68/0xb2
Feb 26 02:45:59 s2 kernel: [<ffffffff8109bc02>] ? delayacct_end+0x74/0x7f
Feb 26 02:45:59 s2 kernel: [<ffffffff8131326d>] ? io_schedule_timeout+0xdc/0x106
Feb 26 02:45:59 s2 kernel: [<ffffffff81066932>] ? autoremove_wake_function+0x0/0x2e
Feb 26 02:45:59 s2 kernel: [<ffffffff810c1ce4>] ? shrink_list+0x533/0x772
Feb 26 02:45:59 s2 kernel: [<ffffffff810b8759>] ? mempool_alloc+0x5e/0x10c
Feb 26 02:45:59 s2 kernel: [<ffffffff81114b36>] ? bio_alloc_bioset+0x45/0xb7
Feb 26 02:45:59 s2 kernel: [<ffffffff81245b6d>] ? clone_bio+0x44/0xce
Feb 26 02:45:59 s2 kernel: [<ffffffff810c21af>] ? shrink_zone+0x28c/0x367
Feb 26 02:45:59 s2 kernel: [<ffffffff8103fc01>] ? update_curr+0xa2/0x10e
Feb 26 02:45:59 s2 kernel: [<ffffffff8100f64b>] ? __switch_to+0xd0/0x297
Feb 26 02:45:59 s2 kernel: [<ffffffff8117f0fa>] ? rb_erase+0x1b2/0x279
Feb 26 02:45:59 s2 kernel: [<ffffffff810c2689>] ? zone_reclaim+0x276/0x357
Feb 26 02:45:59 s2 kernel: [<ffffffff810c0163>] ? isolate_pages_global+0x0/0x20f
Feb 26 02:45:59 s2 kernel: [<ffffffff810bb26e>] ? zone_watermark_ok+0x20/0xb1
Feb 26 02:45:59 s2 kernel: [<ffffffff810bc558>] ? get_page_from_freelist+0x1ae/0x68d
Feb 26 02:45:59 s2 kernel: [<ffffffff8104a47e>] ? try_to_wake_up+0x2c4/0x2d6
Feb 26 02:45:59 s2 kernel: [<ffffffff8103a946>] ? __wake_up_common+0x44/0x73
Feb 26 02:45:59 s2 kernel: [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x128/0x6aa
Feb 26 02:45:59 s2 kernel: [<ffffffffa05756d4>] ? __apic_accept_irq+0x183/0x228 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffff810e96b2>] ? new_slab+0x4b/0x236
Feb 26 02:45:59 s2 kernel: [<ffffffff810e9a69>] ? __slab_alloc+0x1cc/0x388
Feb 26 02:45:59 s2 kernel: [<ffffffffa0569c54>] ? mmu_topup_memory_caches+0x145/0x183 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa0569c54>] ? mmu_topup_memory_caches+0x145/0x183 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffff810e9e03>] ? kmem_cache_alloc+0x7f/0x139
Feb 26 02:45:59 s2 kernel: [<ffffffffa0569c54>] ? mmu_topup_memory_caches+0x145/0x183 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa055cbb0>] ? cpuid_maxphyaddr+0xc/0x1f [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa056d168>] ? tdp_page_fault+0x1e/0xfb [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa056e19a>] ? kvm_mmu_page_fault+0x19/0x88 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa056512b>] ? kvm_arch_vcpu_ioctl_run+0x7ed/0xa44 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05579d1>] ? kvm_vcpu_ioctl+0xf1/0x4e6 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffff810402b9>] ? set_next_entity+0x34/0x56
Feb 26 02:45:59 s2 kernel: [<ffffffff810418df>] ? pick_next_task_fair+0xca/0xd6
Feb 26 02:45:59 s2 kernel: [<ffffffff81047c4a>] ? finish_task_switch+0x3a/0xaf
Feb 26 02:45:59 s2 kernel: [<ffffffff810fd25a>] ? vfs_ioctl+0x21/0x6c
Feb 26 02:45:59 s2 kernel: [<ffffffff810fd7a8>] ? do_vfs_ioctl+0x48d/0x4cb
Feb 26 02:45:59 s2 kernel: [<ffffffff8107c86e>] ? sys_futex+0x113/0x131
Feb 26 02:45:59 s2 kernel: [<ffffffff810fd823>] ? sys_ioctl+0x3d/0x5c
Feb 26 02:45:59 s2 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Feb 26 02:45:59 s2 kernel: gzip D ffff88021383c800 0 11340 11337 0x00000000
Feb 26 02:45:59 s2 kernel: ffffffff81491c30 0000000000000086 0000000000000001 ffff8801ee8c7048
Feb 26 02:45:59 s2 kernel: ffff880014cf1000 ffff88036341a400 000000000000fa40 ffff880014d65fd8
Feb 26 02:45:59 s2 kernel: 0000000000016940 0000000000016940 ffff88021383c800 ffff88021383caf8
Feb 26 02:45:59 s2 kernel: Call Trace:
Feb 26 02:45:59 s2 kernel: [<ffffffff81247b1c>] ? dm_table_unplug_all+0x4b/0xb4
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81313d95>] ? io_schedule+0x9b/0xfc
Feb 26 02:45:59 s2 kernel: [<ffffffff81111856>] ? sync_buffer+0x3b/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff813142bf>] ? __wait_on_bit+0x41/0x70
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81314359>] ? out_of_line_wait_on_bit+0x6b/0x77
Feb 26 02:45:59 s2 kernel: [<ffffffff81066960>] ? wake_bit_function+0x0/0x23
Feb 26 02:45:59 s2 kernel: [<ffffffff811118a7>] ? bh_submit_read+0x3e/0x4e
Feb 26 02:45:59 s2 kernel: [<ffffffffa05eb566>] ? read_block_bitmap+0x7a/0x140 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ec144>] ? ext2_new_blocks+0x1f9/0x56c [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff81110d8f>] ? __getblk+0x26/0x29a
Feb 26 02:45:59 s2 kernel: [<ffffffffa05eefe5>] ? ext2_get_branch+0x98/0x11b [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05efabe>] ? ext2_get_block+0x38f/0x701 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff811102a6>] ? alloc_buffer_head+0x3d/0x42
Feb 26 02:45:59 s2 kernel: [<ffffffff81112230>] ? __block_prepare_write+0x14c/0x2c0
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ef72f>] ? ext2_get_block+0x0/0x701 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff811124ff>] ? block_write_begin+0x7a/0xc7
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ef71e>] ? ext2_write_begin+0x22/0x27 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ef72f>] ? ext2_get_block+0x0/0x701 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff810b798a>] ? generic_file_buffered_write+0x118/0x278
Feb 26 02:45:59 s2 kernel: [<ffffffff810b7e9b>] ? __generic_file_aio_write+0x25f/0x293
Feb 26 02:45:59 s2 kernel: [<ffffffff810f891d>] ? pipe_read+0x39c/0x3af
Feb 26 02:45:59 s2 kernel: [<ffffffff810b7f28>] ? generic_file_aio_write+0x59/0x9f
Feb 26 02:45:59 s2 kernel: [<ffffffff810f1282>] ? do_sync_write+0xce/0x113
Feb 26 02:45:59 s2 kernel: [<ffffffff81066932>] ? autoremove_wake_function+0x0/0x2e
Feb 26 02:45:59 s2 kernel: [<ffffffff81313c93>] ? thread_return+0xdc/0x143
Feb 26 02:45:59 s2 kernel: [<ffffffff810f1c82>] ? vfs_write+0xa9/0x102
Feb 26 02:45:59 s2 kernel: [<ffffffff810f1dee>] ? sys_write+0x49/0xc1
Feb 26 02:45:59 s2 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Feb 26 02:45:59 s2 kernel: flush-254:12 D ffff8802fc954000 0 15099 2 0x00000000
Feb 26 02:45:59 s2 kernel: ffff8801ee88e000 0000000000000046 0001120014d6bbc0 0000000000000010
Feb 26 02:45:59 s2 kernel: ffff880014cf1000 ffff88036341a400 000000000000fa40 ffff88002b1d7fd8
Feb 26 02:45:59 s2 kernel: 0000000000016940 0000000000016940 ffff8802fc954000 ffff8802fc9542f8
Feb 26 02:45:59 s2 kernel: Call Trace:
Feb 26 02:45:59 s2 kernel: [<ffffffff81247b1c>] ? dm_table_unplug_all+0x4b/0xb4
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81313d95>] ? io_schedule+0x9b/0xfc
Feb 26 02:45:59 s2 kernel: [<ffffffff81111856>] ? sync_buffer+0x3b/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff813141c2>] ? __wait_on_bit_lock+0x3f/0x84
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81314272>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
Feb 26 02:45:59 s2 kernel: [<ffffffff81066960>] ? wake_bit_function+0x0/0x23
Feb 26 02:45:59 s2 kernel: [<ffffffff81112be8>] ? __block_write_full_page+0x159/0x2b0
Feb 26 02:45:59 s2 kernel: [<ffffffff811119e5>] ? end_buffer_async_write+0x0/0x13b
Feb 26 02:45:59 s2 kernel: [<ffffffff81114cf0>] ? blkdev_get_block+0x0/0x57
Feb 26 02:45:59 s2 kernel: [<ffffffff810bd336>] ? __writepage+0xa/0x2d
Feb 26 02:45:59 s2 kernel: [<ffffffff810bd9d2>] ? write_cache_pages+0x20b/0x327
Feb 26 02:45:59 s2 kernel: [<ffffffff810bd32c>] ? __writepage+0x0/0x2d
Feb 26 02:45:59 s2 kernel: [<ffffffff8110b376>] ? writeback_single_inode+0xe7/0x2da
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c07c>] ? writeback_inodes_wb+0x424/0x4ff
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c283>] ? wb_writeback+0x12c/0x1ab
Feb 26 02:45:59 s2 kernel: [<ffffffff8105b8bf>] ? try_to_del_timer_sync+0x63/0x6c
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c4f9>] ? wb_do_writeback+0x14f/0x165
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c540>] ? bdi_writeback_task+0x31/0xaa
Feb 26 02:45:59 s2 kernel: [<ffffffff810cc2d0>] ? bdi_start_fn+0x0/0xd0
Feb 26 02:45:59 s2 kernel: [<ffffffff810cc340>] ? bdi_start_fn+0x70/0xd0
Feb 26 02:45:59 s2 kernel: [<ffffffff810cc2d0>] ? bdi_start_fn+0x0/0xd0
Feb 26 02:45:59 s2 kernel: [<ffffffff81066666>] ? kthread+0xc0/0xca
Feb 26 02:45:59 s2 kernel: [<ffffffff81011c6a>] ? child_rip+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff810665a6>] ? kthread+0x0/0xca
Feb 26 02:45:59 s2 kernel: [<ffffffff81011c60>] ? child_rip+0x0/0x20
M***@mWare.ca
2011-02-26 15:14:55 UTC
Permalink
Post by Lars Wilke
I am experiencing reproducible KVM VM crashes/hangs and once a lost network config, when doing backups
via vzdump on the Hypervisor. Most of the time the VM just got stuck and i have to shut it down via
qm stop. Note the problem only occurs with the VM which i am backing up and sometimes with this vm when
copying large files on the HV node.
What CPU are you using? I've recently discovered that the 2000-series
Opterons can be very unreliable under high IO loads with KVM. I upgraded
my KVM hosts to Xeons and the problem went away instantly.

Subsequently, I had a kernel guy (bcrl) spend some time trying to fix
it, but the crashes were never consistent. In fact, almost never the
same twice. After a month, we gave up and will use the hardware for
other purposes. (They're great for OpenVZ VMs.)
A great way to reproduce the crash would be to run a virus scan inside a
Windows XP KVM. FreeBSD VMs would also go down frequently. Linux VMs
were a lot more reliable, but would die occasionally too.

Myke

PS: I never really got much reaction from this list to my woes, so
either y'all think I'm half-cocked crazy, or it's just not interesting
to anyone else ;)
Lars Wilke
2011-02-26 20:10:21 UTC
Permalink
Post by M***@mWare.ca
Post by Lars Wilke
I am experiencing reproducible KVM VM crashes/hangs and once a lost network
config, when doing backups via vzdump on the Hypervisor. Most of the time
the VM just got stuck and i have to shut it down via
qm stop. Note the problem only occurs with the VM which i am backing up and
sometimes with this vm when copying large files on the HV node.
What CPU are you using? I've recently discovered that the
2000-series Opterons can be very unreliable under high IO loads with
KVM. I upgraded my KVM hosts to Xeons and the problem went away
instantly.
model name : Intel(R) Xeon(R) CPU E5502 @ 1.87GHz

The cpus seem to work ok. It just seems the VM freezes and thats it.
I have to use qm stop to take the vm offline.

cheers
--lars
Dietmar Maurer
2011-02-27 08:20:00 UTC
Permalink
You are using software raid?

- Dietmar
-----Original Message-----
Sent: Samstag, 26. Februar 2011 14:30
Subject: [PVE-User] CPU soft lookup
Hi,
I am experiencing reproducible KVM VM crashes/hangs and once a lost network
config, when doing backups via vzdump on the Hypervisor. Most of the time the
VM just got stuck and i have to shut it down via qm stop. Note the problem only
occurs with the VM which i am backing up and sometimes with this vm when
copying large files on the HV node. The VM serves NFS and some databases and
has around 150GB of data which need to get backed up everytime. The other 3
VMs never crashed but once in a while i find the same warning in the logs. I guess
the reason for them not crashing might be that they are considerably smaller in
terms of used disk space.
I found this bug report
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/579276
and it contains some links to reports from Red Hat.
I am not exactly sure if the proposed patches fix my problem but these fixes are
all in newer kernel branches. My question is now if it would be worth to try the
2.6.35 Kernel on the HV. But what about the VMs, do i need a newer/patched
kernel there too?
Feb 26 13:25:17 be01 kernel: BUG: soft lockup - CPU#2 stuck for 10s!
Feb 26 13:25:17 be01 kernel: Modules linked in: nfsd exportfs nfs_acl
auth_rpcgss ipv6 xfrm_nalgo crypto_api act_police cls_fw cls_u32 sch_htb
sch_hfsc sch_ingress sch_sfq xt_connlimit xt_realm iptable_raw xt_comment
xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME
ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE
ipt_iprange ipt_hashlimit ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_CLUSTERIP
ipt_ah ipt_addrtype ip_nat_tftp ip_nat_snmp_basic ip_nat_sip ip_nat_pptp
ip_nat_irc ip_nat_h323 ip_nat_ftp ip_nat_amanda ip_conntrack_tftp
ip_conntrack_sip ip_conntrack_pptp ip_conntrack_netbios_ns ip_conntrack_irc
ip_conntrack_h323 ip_conntrack_ftp ts_kmp ip_conntrack_amanda xt_tcpmss
xt_pkttype xt_physdev bridge xt_NFQUEUE xt_multiport xt_MARK xt_mark
xt_mac xt_limit xt_length xt_helper xt_DSCP xt_dccp xt_conntrack
xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state
iptable_nat ip_nat ip_conntrack iptable_mangle nfnetlink iptable_filter i
p_tables x_tables lockd sunrpc xfs Feb 26 13:25:17 be01 kernel: dm_multipath
scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button
battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy joydev
virtio_blk virtio_balloon virtio_net i2c_piix4 virtio_pci i2c_core virtio_ring ide_cd
serio_raw virtio pcspkr cdrom dm_raid45 dm_message dm_region_hash
dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Feb 26 13:25:17
be01 kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.32.1.el5 #1 Feb 26
13:25:17 be01 kernel: RIP: 0010:[<ffffffff8006b36b>] [<ffffffff8006b36b>]
0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 Feb 26
ffff81021fcc0038 R11: 0000000000000000 R12: 00000000000fc133 Feb 26
ffff81021fc1c080 Feb 26 13:25:17 be01 kernel: FS: 0000000000000000(0000)
GS:ffff81021fc1be40(0000) knlGS:0000000000000000 Feb 26 13:25:17 be01
kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 26 13:25:17
Feb 26 13:25:17 be01 kernel: [<ffffffff800492c4>] cpu_idle+0x95/0xb8 Feb 26
13:25:17 be01 kernel: [<ffffffff80077991>] start_secondary+0x498/0x4a7 Feb
The VMs are all CentOS 5.5.
On the HV nodes, there are 2 KVM VMs each which are more or less identical.
No OpenVZ is used, the two HV nodes share the storage via an LSI SAS HBA with
15K RPM disks. The VMs use the deadline IO scheduler and the HVs the default CFQ one.
# pveversion -v
pve-manager: 1.7-11 (pve-manager/1.7/5470) running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4
Debian Version: 5.0.8
name: be01
ide2: none,media=cdrom
bootdisk: ide0
ostype: l26
ide0: kvm-share1:vm-102-disk-1,cache=none
memory: 8192
sockets: 2
onboot: 1
cores: 2
vlan2: virtio=AA:A0:9F:11:67:E1
virtio0: kvm-share1:vm-102-disk-2,cache=none
boot: c
freeze: 0
cpuunits: 200000
acpi: 1
kvm: 1
vlan1: virtio=BE:30:52:BF:27:36
virtio1: data-share1:vm-102-disk-1,cache=none
virtio2: kvm-share1:vm-102-disk-3,cache=none
args: -balloon virtio
Backup is done like this
nice -n 14 vzdump --snapshot --size 2048 --compress --stdexcludes --ionice 7 --
bwlimit 6148 --dumpdir /mnt 102
I tried first without nice, bwlimit and ionice this got me into trouble really fast.
Repeated experiments showed this to be usable values for the moment, but still
sometimes i get the kernel warnings shown above.
When copying large files i use this, else sometimes i have the same problem as
nice -n 14 cstream -i <input> -t 6148000 -o <output> & ionice -c 2 -n 7 -p "$!"
Btw. how can i apply IO Limits to the VMs, i would like to limit the allowed
network and disk resource usage.
Especially disk usage since shared storage is used.
IIUC i could use CGROUPs to limit block IO bandwidth and network usage.
Anybody here who would not mind to share his experiences with doing so?
thanks
--lars
_______________________________________________
pve-user mailing list
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Lars Wilke
2011-02-27 12:09:44 UTC
Permalink
Post by Dietmar Maurer
You are using software raid?
No, raid10 is implemented at the controller level.
SAS1064ET PCI-Express Fusion-MPT SAS

The Maschine is a Intel Modular Server with two Server Boards
and 5 SAS 2,5" HDDs (1 Hot Spare). Only one storage controller is
installed.

cheers
--lars
Dietmar Maurer
2011-02-28 06:39:28 UTC
Permalink
-----Original Message-----
Sent: Sonntag, 27. Februar 2011 13:10
To: Dietmar Maurer
Subject: Re: [PVE-User] CPU soft lookup
Post by Dietmar Maurer
You are using software raid?
No, raid10 is implemented at the controller level.
SAS1064ET PCI-Express Fusion-MPT SAS
So why is the
Lars Wilke
2011-02-28 14:21:39 UTC
Permalink
Post by Lars Wilke
No, raid10 is implemented at the controller level.
SAS1064ET PCI-Express Fusion-MPT SAS
So why is there a "dm_raid45" module loaded?
Hm, this is indeed strange, these are the currently loaded dm Modules.
But there is no software raid at all.

These are the currently loaded modules
[***@be01 ~]# lsmod|grep dm
dm_multipath 56793 0
scsi_dh 42177 1 dm_multipath
dm_raid45 99657 0
dm_message 36289 1 dm_raid45
dm_region_hash 46145 1 dm_raid45
dm_mem_cache 38977 1 dm_raid45
dm_snapshot 52233 0
dm_zero 35265 0
dm_mirror 54737 0
dm_log 44993 3 dm_raid45,dm_region_hash,dm_mirror
dm_mod 101521 19
dm_multipath,dm_raid45,dm_snapshot,dm_zero,dm_mirror,dm_log

[***@be01 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/hda2 VolGroup00 lvm2 a- 11.88G 0
/dev/vda1 VolGroup01 lvm2 a- 100.00G 5.00G
/dev/vdb1 data lvm2 a- 40.00G 10.00G
/dev/vdc1 data lvm2 a- 40.00G 20.00G

[***@be01 ~]# cat /proc/mdstat
Personalities :
unused devices: <none>

mdadm is not even installed on the system.

I can unload the module without any problem.
I guess there is some magic in the boot process which loads the dm_raid45 module.
All the other VMs have this module loaded, too.
Dietmar Maurer
2011-02-28 15:57:24 UTC
Permalink
-----Original Message-----
Sent: Montag, 28. Februar 2011 15:22
To: Dietmar Maurer
Subject: Re: [PVE-User] CPU soft lookup
Post by Lars Wilke
No, raid10 is implemented at the controller level.
SAS1064ET PCI-Express Fusion-MPT SAS
So why is there a "dm_raid45" module loaded?
Hm, this is indeed strange, these are the currently loaded dm Modules.
But there is no software raid at all.
These are the currently loaded modules
dm_multipath 56793 0
scsi_dh 42177 1 dm_multipath
dm_raid45 99657 0
AFAIK dm_raid45 is software raid (dm raid). So what kind of storage are you using?
Or is that loaded by the multipath tools?

- D
Lars Wilke
2011-02-28 17:05:39 UTC
Permalink
Post by Dietmar Maurer
-----Original Message-----
Sent: Montag, 28. Februar 2011 15:22
To: Dietmar Maurer
Subject: Re: [PVE-User] CPU soft lookup
Post by Lars Wilke
No, raid10 is implemented at the controller level.
SAS1064ET PCI-Express Fusion-MPT SAS
So why is there a "dm_raid45" module loaded?
Hm, this is indeed strange, these are the currently loaded dm Modules.
But there is no software raid at all.
These are the currently loaded modules
dm_multipath 56793 0
scsi_dh 42177 1 dm_multipath
dm_raid45 99657 0
AFAIK dm_raid45 is software raid (dm raid). So what kind of storage are you using?
The system is a Intel Modular Server with two blades and one storage
controller. The HDDs for the VM are LVM Volumes.
Post by Dietmar Maurer
Or is that loaded by the multipath tools?
I am not sure, the multipathd is disabled in the init system.
Anyway, the module unloads without a problem. It is definitly not used.
Lars Wilke
2011-02-27 12:25:34 UTC
Permalink
Hi,

just a quick update from me.

I also tried the vzdump now without --compress, still no luck.

This night i will try this:
# ionice -c 2 -n 7 nice -n 19 vzdump --snapshot --size 2048 --stdexcludes --ionice 7 --bwlimit 6148 --dumpdir /mnt

/mnt holds an external USB disk which is encrypted via LUKS.

If that fails too, i guess my last option would be to investigate cgroups.

Btw. I noticed that kvmtrace is not working.

# kvmtrace -D /tmp -w 2 -o bla
KVM_TRACE_ENABLE: Operation not supported

thanks
--lars
Robert Fantini
2011-02-27 12:49:42 UTC
Permalink
this sort of a shot in the dark, but have you tried using the 2.6.18-5
kernel?

check the forums at proxmox and openvz for more info.

for us not using 2.6.18-5 caused containers which could not shutdown , hung
backup processes etc.
Post by Lars Wilke
Hi,
just a quick update from me.
I also tried the vzdump now without --compress, still no luck.
# ionice -c 2 -n 7 nice -n 19 vzdump --snapshot --size 2048 --stdexcludes
--ionice 7 --bwlimit 6148 --dumpdir /mnt
/mnt holds an external USB disk which is encrypted via LUKS.
If that fails too, i guess my last option would be to investigate cgroups.
Btw. I noticed that kvmtrace is not working.
# kvmtrace -D /tmp -w 2 -o bla
KVM_TRACE_ENABLE: Operation not supported
thanks
--lars
_______________________________________________
pve-user mailing list
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
Lars Wilke
2011-02-28 14:31:35 UTC
Permalink
Post by Robert Fantini
this sort of a shot in the dark, but have you tried using the 2.6.18-5
kernel?
check the forums at proxmox and openvz for more info.
for us not using 2.6.18-5 caused containers which could not shutdown ,
hung backup processes etc.
Hm, i remember having problems with 2.6.32 and OpenVZ Containers myself and
switching back to 2.6.18. This was probably half a year ago. But here i use
KVM exclusively. I initially started with 2.6.18 and switched to 2.6.32.
Seeing better performance while doing so. I guess before going back to 2.6.18,
i would first try 2.6.35 :)

thanks
--lars
Dietmar Maurer
2011-02-28 06:45:41 UTC
Permalink
Post by Lars Wilke
Btw. I noticed that kvmtrace is not working.
The kvm team removed that tool from the sources anyways.

- Dietmar
Continue reading on narkive:
Loading...