[PVE-User] CPU soft lookup

Lars Wilke l.wilke at it-betrieb.de
Sat Feb 26 21:19:45 CET 2011


* Giovanni Toraldo wrote:
> 2011/2/26 Lars Wilke <l.wilke at it-betrieb.de>:
> > Feb 26 13:25:17 be01 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [swapper:0]
>
> Don't get tempted by the message itself, AFAIK it's a common behavior
> inside a VM when the host is getting very high loads (ex: during
> backups).

Hm ok, this might explain why the message sometimes appears and the vm
runs still further. But then all of a sudden freezes and never comes back
to live again.

> It would be a real problem if you got those message on the host
> machine (where a CPU stuck can be a symptom of an hardware or firmware
> issue).

Now that you mention it, there are two things which come to my mind. First
the problemativ VMs all have one or more VIRTIO HDDs and on the HV node i
saw this in the logs when the VM finally froze to death

Feb 26 02:45:59 s2 kernel: kvm           D ffff8801e6e96000     0  5143      1 0x00000000
Feb 26 02:45:59 s2 kernel: ffff8801ee88c000 0000000000000082 0003520007f53df8 0000000000000000
Feb 26 02:45:59 s2 kernel: 0000000000000001 ffffffff81508580 000000000000fa40 ffff8801275fbfd8
Feb 26 02:45:59 s2 kernel: 0000000000016940 0000000000016940 ffff8801e6e96000 ffff8801e6e962f8
Feb 26 02:45:59 s2 kernel: Call Trace:
Feb 26 02:45:59 s2 kernel: [<ffffffff81247b1c>] ? dm_table_unplug_all+0x4b/0xb4
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff810b6d99>] ? sync_page+0x0/0x46
Feb 26 02:45:59 s2 kernel: [<ffffffff81313d95>] ? io_schedule+0x9b/0xfc
Feb 26 02:45:59 s2 kernel: [<ffffffff810b6dda>] ? sync_page+0x41/0x46
Feb 26 02:45:59 s2 kernel: [<ffffffff813142bf>] ? __wait_on_bit+0x41/0x70
Feb 26 02:45:59 s2 kernel: [<ffffffff810b6f5e>] ? wait_on_page_bit+0x6b/0x71
Feb 26 02:45:59 s2 kernel: [<ffffffff81066960>] ? wake_bit_function+0x0/0x23
Feb 26 02:45:59 s2 kernel: [<ffffffff810c0f2d>] ? shrink_page_list+0x14e/0x632
Feb 26 02:45:59 s2 kernel: [<ffffffff8105b8d4>] ? del_timer_sync+0xc/0x16
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff81314114>] ? schedule_timeout+0xad/0xdd
Feb 26 02:45:59 s2 kernel: [<ffffffff8106dd7f>] ? ktime_get_ts+0x68/0xb2
Feb 26 02:45:59 s2 kernel: [<ffffffff8109bc02>] ? delayacct_end+0x74/0x7f
Feb 26 02:45:59 s2 kernel: [<ffffffff8131326d>] ? io_schedule_timeout+0xdc/0x106
Feb 26 02:45:59 s2 kernel: [<ffffffff81066932>] ? autoremove_wake_function+0x0/0x2e
Feb 26 02:45:59 s2 kernel: [<ffffffff810c1ce4>] ? shrink_list+0x533/0x772
Feb 26 02:45:59 s2 kernel: [<ffffffff810b8759>] ? mempool_alloc+0x5e/0x10c
Feb 26 02:45:59 s2 kernel: [<ffffffff81114b36>] ? bio_alloc_bioset+0x45/0xb7
Feb 26 02:45:59 s2 kernel: [<ffffffff81245b6d>] ? clone_bio+0x44/0xce
Feb 26 02:45:59 s2 kernel: [<ffffffff810c21af>] ? shrink_zone+0x28c/0x367
Feb 26 02:45:59 s2 kernel: [<ffffffff8103fc01>] ? update_curr+0xa2/0x10e
Feb 26 02:45:59 s2 kernel: [<ffffffff8100f64b>] ? __switch_to+0xd0/0x297
Feb 26 02:45:59 s2 kernel: [<ffffffff8117f0fa>] ? rb_erase+0x1b2/0x279
Feb 26 02:45:59 s2 kernel: [<ffffffff810c2689>] ? zone_reclaim+0x276/0x357
Feb 26 02:45:59 s2 kernel: [<ffffffff810c0163>] ? isolate_pages_global+0x0/0x20f
Feb 26 02:45:59 s2 kernel: [<ffffffff810bb26e>] ? zone_watermark_ok+0x20/0xb1
Feb 26 02:45:59 s2 kernel: [<ffffffff810bc558>] ? get_page_from_freelist+0x1ae/0x68d
Feb 26 02:45:59 s2 kernel: [<ffffffff8104a47e>] ? try_to_wake_up+0x2c4/0x2d6
Feb 26 02:45:59 s2 kernel: [<ffffffff8103a946>] ? __wake_up_common+0x44/0x73
Feb 26 02:45:59 s2 kernel: [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x128/0x6aa
Feb 26 02:45:59 s2 kernel: [<ffffffffa05756d4>] ? __apic_accept_irq+0x183/0x228 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffff810e96b2>] ? new_slab+0x4b/0x236
Feb 26 02:45:59 s2 kernel: [<ffffffff810e9a69>] ? __slab_alloc+0x1cc/0x388
Feb 26 02:45:59 s2 kernel: [<ffffffffa0569c54>] ? mmu_topup_memory_caches+0x145/0x183 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa0569c54>] ? mmu_topup_memory_caches+0x145/0x183 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffff810e9e03>] ? kmem_cache_alloc+0x7f/0x139
Feb 26 02:45:59 s2 kernel: [<ffffffffa0569c54>] ? mmu_topup_memory_caches+0x145/0x183 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa055cbb0>] ? cpuid_maxphyaddr+0xc/0x1f [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa056d168>] ? tdp_page_fault+0x1e/0xfb [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa056e19a>] ? kvm_mmu_page_fault+0x19/0x88 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa056512b>] ? kvm_arch_vcpu_ioctl_run+0x7ed/0xa44 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05579d1>] ? kvm_vcpu_ioctl+0xf1/0x4e6 [kvm]
Feb 26 02:45:59 s2 kernel: [<ffffffff810402b9>] ? set_next_entity+0x34/0x56
Feb 26 02:45:59 s2 kernel: [<ffffffff810418df>] ? pick_next_task_fair+0xca/0xd6
Feb 26 02:45:59 s2 kernel: [<ffffffff81047c4a>] ? finish_task_switch+0x3a/0xaf
Feb 26 02:45:59 s2 kernel: [<ffffffff810fd25a>] ? vfs_ioctl+0x21/0x6c
Feb 26 02:45:59 s2 kernel: [<ffffffff810fd7a8>] ? do_vfs_ioctl+0x48d/0x4cb
Feb 26 02:45:59 s2 kernel: [<ffffffff8107c86e>] ? sys_futex+0x113/0x131
Feb 26 02:45:59 s2 kernel: [<ffffffff810fd823>] ? sys_ioctl+0x3d/0x5c
Feb 26 02:45:59 s2 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Feb 26 02:45:59 s2 kernel: gzip          D ffff88021383c800     0 11340  11337 0x00000000
Feb 26 02:45:59 s2 kernel: ffffffff81491c30 0000000000000086 0000000000000001 ffff8801ee8c7048
Feb 26 02:45:59 s2 kernel: ffff880014cf1000 ffff88036341a400 000000000000fa40 ffff880014d65fd8
Feb 26 02:45:59 s2 kernel: 0000000000016940 0000000000016940 ffff88021383c800 ffff88021383caf8
Feb 26 02:45:59 s2 kernel: Call Trace:
Feb 26 02:45:59 s2 kernel: [<ffffffff81247b1c>] ? dm_table_unplug_all+0x4b/0xb4
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81313d95>] ? io_schedule+0x9b/0xfc
Feb 26 02:45:59 s2 kernel: [<ffffffff81111856>] ? sync_buffer+0x3b/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff813142bf>] ? __wait_on_bit+0x41/0x70
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81314359>] ? out_of_line_wait_on_bit+0x6b/0x77
Feb 26 02:45:59 s2 kernel: [<ffffffff81066960>] ? wake_bit_function+0x0/0x23
Feb 26 02:45:59 s2 kernel: [<ffffffff811118a7>] ? bh_submit_read+0x3e/0x4e
Feb 26 02:45:59 s2 kernel: [<ffffffffa05eb566>] ? read_block_bitmap+0x7a/0x140 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ec144>] ? ext2_new_blocks+0x1f9/0x56c [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff81110d8f>] ? __getblk+0x26/0x29a
Feb 26 02:45:59 s2 kernel: [<ffffffffa05eefe5>] ? ext2_get_branch+0x98/0x11b [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05efabe>] ? ext2_get_block+0x38f/0x701 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff811102a6>] ? alloc_buffer_head+0x3d/0x42
Feb 26 02:45:59 s2 kernel: [<ffffffff81112230>] ? __block_prepare_write+0x14c/0x2c0
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ef72f>] ? ext2_get_block+0x0/0x701 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff811124ff>] ? block_write_begin+0x7a/0xc7
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ef71e>] ? ext2_write_begin+0x22/0x27 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffffa05ef72f>] ? ext2_get_block+0x0/0x701 [ext2]
Feb 26 02:45:59 s2 kernel: [<ffffffff810b798a>] ? generic_file_buffered_write+0x118/0x278
Feb 26 02:45:59 s2 kernel: [<ffffffff810b7e9b>] ? __generic_file_aio_write+0x25f/0x293
Feb 26 02:45:59 s2 kernel: [<ffffffff810f891d>] ? pipe_read+0x39c/0x3af
Feb 26 02:45:59 s2 kernel: [<ffffffff810b7f28>] ? generic_file_aio_write+0x59/0x9f
Feb 26 02:45:59 s2 kernel: [<ffffffff810f1282>] ? do_sync_write+0xce/0x113
Feb 26 02:45:59 s2 kernel: [<ffffffff81066932>] ? autoremove_wake_function+0x0/0x2e
Feb 26 02:45:59 s2 kernel: [<ffffffff81313c93>] ? thread_return+0xdc/0x143
Feb 26 02:45:59 s2 kernel: [<ffffffff810f1c82>] ? vfs_write+0xa9/0x102
Feb 26 02:45:59 s2 kernel: [<ffffffff810f1dee>] ? sys_write+0x49/0xc1
Feb 26 02:45:59 s2 kernel: [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
Feb 26 02:45:59 s2 kernel: flush-254:12  D ffff8802fc954000     0 15099      2 0x00000000
Feb 26 02:45:59 s2 kernel: ffff8801ee88e000 0000000000000046 0001120014d6bbc0 0000000000000010
Feb 26 02:45:59 s2 kernel: ffff880014cf1000 ffff88036341a400 000000000000fa40 ffff88002b1d7fd8
Feb 26 02:45:59 s2 kernel: 0000000000016940 0000000000016940 ffff8802fc954000 ffff8802fc9542f8
Feb 26 02:45:59 s2 kernel: Call Trace:
Feb 26 02:45:59 s2 kernel: [<ffffffff81247b1c>] ? dm_table_unplug_all+0x4b/0xb4
Feb 26 02:45:59 s2 kernel: [<ffffffff810165b1>] ? read_tsc+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81313d95>] ? io_schedule+0x9b/0xfc
Feb 26 02:45:59 s2 kernel: [<ffffffff81111856>] ? sync_buffer+0x3b/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff813141c2>] ? __wait_on_bit_lock+0x3f/0x84
Feb 26 02:45:59 s2 kernel: [<ffffffff8111181b>] ? sync_buffer+0x0/0x40
Feb 26 02:45:59 s2 kernel: [<ffffffff81314272>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
Feb 26 02:45:59 s2 kernel: [<ffffffff81066960>] ? wake_bit_function+0x0/0x23
Feb 26 02:45:59 s2 kernel: [<ffffffff81112be8>] ? __block_write_full_page+0x159/0x2b0
Feb 26 02:45:59 s2 kernel: [<ffffffff811119e5>] ? end_buffer_async_write+0x0/0x13b
Feb 26 02:45:59 s2 kernel: [<ffffffff81114cf0>] ? blkdev_get_block+0x0/0x57
Feb 26 02:45:59 s2 kernel: [<ffffffff810bd336>] ? __writepage+0xa/0x2d
Feb 26 02:45:59 s2 kernel: [<ffffffff810bd9d2>] ? write_cache_pages+0x20b/0x327
Feb 26 02:45:59 s2 kernel: [<ffffffff810bd32c>] ? __writepage+0x0/0x2d
Feb 26 02:45:59 s2 kernel: [<ffffffff8110b376>] ? writeback_single_inode+0xe7/0x2da
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c07c>] ? writeback_inodes_wb+0x424/0x4ff
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c283>] ? wb_writeback+0x12c/0x1ab
Feb 26 02:45:59 s2 kernel: [<ffffffff8105b8bf>] ? try_to_del_timer_sync+0x63/0x6c
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c4f9>] ? wb_do_writeback+0x14f/0x165
Feb 26 02:45:59 s2 kernel: [<ffffffff8110c540>] ? bdi_writeback_task+0x31/0xaa
Feb 26 02:45:59 s2 kernel: [<ffffffff810cc2d0>] ? bdi_start_fn+0x0/0xd0
Feb 26 02:45:59 s2 kernel: [<ffffffff810cc340>] ? bdi_start_fn+0x70/0xd0
Feb 26 02:45:59 s2 kernel: [<ffffffff810cc2d0>] ? bdi_start_fn+0x0/0xd0
Feb 26 02:45:59 s2 kernel: [<ffffffff81066666>] ? kthread+0xc0/0xca
Feb 26 02:45:59 s2 kernel: [<ffffffff81011c6a>] ? child_rip+0xa/0x20
Feb 26 02:45:59 s2 kernel: [<ffffffff810665a6>] ? kthread+0x0/0xca
Feb 26 02:45:59 s2 kernel: [<ffffffff81011c60>] ? child_rip+0x0/0x20






More information about the pve-user mailing list