[PVE-User] CPU soft lookup

Dietmar Maurer dietmar at proxmox.com
Sun Feb 27 09:20:00 CET 2011


You are using software raid?

- Dietmar

> -----Original Message-----
> From: pve-user-bounces at pve.proxmox.com [mailto:pve-user-
> bounces at pve.proxmox.com] On Behalf Of Lars Wilke
> Sent: Samstag, 26. Februar 2011 14:30
> To: pve-user at pve.proxmox.com
> Subject: [PVE-User] CPU soft lookup
> 
> Hi,
> 
> I am experiencing reproducible KVM VM crashes/hangs and once a lost network
> config, when doing backups via vzdump on the Hypervisor. Most of the time the
> VM just got stuck and i have to shut it down via qm stop. Note the problem only
> occurs with the VM which i am backing up and sometimes with this vm when
> copying large files on the HV node. The VM serves NFS and some databases and
> has around 150GB of data which need to get backed up everytime. The other 3
> VMs never crashed but once in a while i find the same warning in the logs. I guess
> the reason for them not crashing might be that they are considerably smaller in
> terms of used disk space.
> 
> I found this bug report
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/579276
> and it contains some links to reports from Red Hat.
> I am not exactly sure if the proposed patches fix my problem but these fixes are
> all in newer kernel branches. My question is now if it would be worth to try the
> 2.6.35 Kernel on the HV. But what about the VMs, do i need a newer/patched
> kernel there too?
> 
> Feb 26 13:25:17 be01 kernel: BUG: soft lockup - CPU#2 stuck for 10s!
> [swapper:0] Feb 26 13:25:17 be01 kernel: CPU 2:
> Feb 26 13:25:17 be01 kernel: Modules linked in: nfsd exportfs nfs_acl
> auth_rpcgss ipv6 xfrm_nalgo crypto_api act_police cls_fw cls_u32 sch_htb
> sch_hfsc sch_ingress sch_sfq xt_connlimit xt_realm iptable_raw xt_comment
> xt_policy ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME
> ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE
> ipt_iprange ipt_hashlimit ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_CLUSTERIP
> ipt_ah ipt_addrtype ip_nat_tftp ip_nat_snmp_basic ip_nat_sip ip_nat_pptp
> ip_nat_irc ip_nat_h323 ip_nat_ftp ip_nat_amanda ip_conntrack_tftp
> ip_conntrack_sip ip_conntrack_pptp ip_conntrack_netbios_ns ip_conntrack_irc
> ip_conntrack_h323 ip_conntrack_ftp ts_kmp ip_conntrack_amanda xt_tcpmss
> xt_pkttype xt_physdev bridge xt_NFQUEUE xt_multiport xt_MARK xt_mark
> xt_mac xt_limit xt_length xt_helper xt_DSCP xt_dccp xt_conntrack
> xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state
> iptable_nat ip_nat ip_conntrack iptable_mangle nfnetlink iptable_filter i
> p_tables x_tables lockd sunrpc xfs Feb 26 13:25:17 be01 kernel: dm_multipath
> scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button
> battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy joydev
> virtio_blk virtio_balloon virtio_net i2c_piix4 virtio_pci i2c_core virtio_ring ide_cd
> serio_raw virtio pcspkr cdrom dm_raid45 dm_message dm_region_hash
> dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix
> libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Feb 26 13:25:17
> be01 kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.32.1.el5 #1 Feb 26
> 13:25:17 be01 kernel: RIP: 0010:[<ffffffff8006b36b>]  [<ffffffff8006b36b>]
> default_idle+0x29/0x50 Feb 26 13:25:17 be01 kernel: RSP:
> 0018:ffff81021fc67ef0  EFLAGS: 00000246 Feb 26 13:25:17 be01 kernel: RAX:
> 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 Feb 26
> 13:25:17 be01 kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffffffff8030a718 Feb 26 13:25:17 be01 kernel: RBP: ffff81021fc1c270 R08:
> ffff81021fc66000 R09: 000000000000003e Feb 26 13:25:17 be01 kernel: R10:
> ffff81021fcc0038 R11: 0000000000000000 R12: 00000000000fc133 Feb 26
> 13:25:17 be01 kernel: R13: 000022062c42fc61 R14: ffff8101639ff080 R15:
> ffff81021fc1c080 Feb 26 13:25:17 be01 kernel: FS:  0000000000000000(0000)
> GS:ffff81021fc1be40(0000) knlGS:0000000000000000 Feb 26 13:25:17 be01
> kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 26 13:25:17
> be01 kernel: CR2: 00002b8470339000 CR3: 000000020ce1a000 CR4:
> 00000000000006e0 Feb 26 13:25:17 be01 kernel:
> Feb 26 13:25:17 be01 kernel: Call Trace:
> Feb 26 13:25:17 be01 kernel:  [<ffffffff800492c4>] cpu_idle+0x95/0xb8 Feb 26
> 13:25:17 be01 kernel:  [<ffffffff80077991>] start_secondary+0x498/0x4a7 Feb
> 26 13:25:17 be01 kernel:
> 
> The VMs are all CentOS 5.5.
> On the HV nodes, there are 2 KVM VMs each which are more or less identical.
> No OpenVZ is used, the two HV nodes share the storage via an LSI SAS HBA with
> 15K RPM disks. The VMs use the deadline IO scheduler and the HVs the default
> CFQ one.
> 
> # pveversion -v
> pve-manager: 1.7-11 (pve-manager/1.7/5470) running kernel: 2.6.32-4-pve
> proxmox-ve-2.6.32: 1.7-30
> pve-kernel-2.6.32-4-pve: 2.6.32-30
> pve-kernel-2.6.18-2-pve: 2.6.18-5
> qemu-server: 1.1-28
> pve-firmware: 1.0-10
> libpve-storage-perl: 1.0-16
> vncterm: 0.9-2
> vzctl: 3.0.24-1pve4
> vzdump: 1.2-10
> vzprocps: 2.0.11-1dso2
> vzquota: 3.0.11-1
> pve-qemu-kvm: 0.13.0-3
> ksm-control-daemon: 1.0-4
> 
> Debian Version: 5.0.8
> 
> VM configuration:
> name: be01
> ide2: none,media=cdrom
> bootdisk: ide0
> ostype: l26
> ide0: kvm-share1:vm-102-disk-1,cache=none
> memory: 8192
> sockets: 2
> onboot: 1
> description:
> cores: 2
> vlan2: virtio=AA:A0:9F:11:67:E1
> virtio0: kvm-share1:vm-102-disk-2,cache=none
> boot: c
> freeze: 0
> cpuunits: 200000
> acpi: 1
> kvm: 1
> vlan1: virtio=BE:30:52:BF:27:36
> virtio1: data-share1:vm-102-disk-1,cache=none
> virtio2: kvm-share1:vm-102-disk-3,cache=none
> args: -balloon virtio
> 
> Backup is done like this
> nice -n 14 vzdump --snapshot --size 2048 --compress --stdexcludes --ionice 7 --
> bwlimit 6148 --dumpdir /mnt 102
> 
> I tried first without nice, bwlimit and ionice this got me into trouble really fast.
> Repeated experiments showed this to be usable values for the moment, but still
> sometimes i get the kernel warnings shown above.
> 
> When copying large files i use this, else sometimes i have the same problem as
> when doing backups:
> nice -n 14 cstream -i <input> -t 6148000 -o <output> & ionice -c 2 -n 7 -p "$!"
> 
> Btw. how can i apply IO Limits to the VMs, i would like to limit the allowed
> network and disk resource usage.
> Especially disk usage since shared storage is used.
> IIUC i could use CGROUPs to limit block IO bandwidth and network usage.
> Anybody here who would not mind to share his experiences with doing so?
> 
> thanks
>    --lars
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user





More information about the pve-user mailing list