[PVE-User] Random kernel panics of my KVM VMs

Alexandre DERUMIER aderumier at odiso.com
Tue Aug 15 20:53:08 CEST 2017


can you post your vmid.conf ?

----- Mail original -----
De: "Bill Arlofski" <waa-pveuser at revpol.com>
À: "proxmoxve" <pve-user at pve.proxmox.com>
Envoyé: Mardi 15 Août 2017 07:02:12
Objet: [PVE-User] Random kernel panics of my KVM VMs

Hello everyone. 

I am not sure this is the right place to ask, but I am also not sure where to 
start, so this list seemed like a good place. I am happy for any direction as 
to the best place to turn to for a solution. :) 

For quite some time now I have been having random kernel panics on random VMs. 

I have a two-node cluster, currently running a pretty current PVE version: 

PVE Manager Version pve-manager/5.0-23/af4267bf 

Now, these kernel panics have continued through several VM kernel upgrades, 
and even continue after the 4.x to 5.x Proxmox upgrade several weeks ago. In 
addition, I have moved VMs from one Proxmox node to the other to no avail, 
ruling out hardware on one node or the other. 

Also, it does not matter if the VMs have their (QCOW2) disks on the Proxmox 
node's local hardware RAID storage or the Synology NFS-connected storage 

I am trying to verify this by moving a few VMs that seem to panic more often 
than others back to some local hardware RAID storage on one node as I write 
this email... 

Typically the kernel panics occur during the nightly backups of the VMs, but I 
cannot say that this is always when they occur. I _can_ say that the kernel 
panic always reports the sym53c8xx_2 module as the culprit though... 

I have set up remote kernel logging on one VM and here is the kernel panic 
reported: 

----8<---- 
[138539.201838] Kernel panic - not syncing: assertion "i && 
sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file 
"drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399 
[138539.201838] 
[138539.201838] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.34-gentoo #5 
[138539.201838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 
[138539.201838] ffff88023fd03d90 ffffffff813a2408 ffff8800bb842700 
ffffffff81c51450 
[138539.201838] ffff88023fd03e10 ffffffff8111ff3f ffff880200000020 
ffff88023fd03e20 
[138539.201838] ffff88023fd03db8 ffffffff813c70f3 ffffffff81c517b0 
ffffffff81c51400 
[138539.201838] Call Trace: 
[138539.201838] <IRQ> [138539.201838] [<ffffffff813a2408>] dump_stack+0x4d/0x65 
[138539.201838] [<ffffffff8111ff3f>] panic+0xca/0x203 
[138539.201838] [<ffffffff813c70f3>] ? swiotlb_unmap_sg_attrs+0x43/0x60 
[138539.201838] [<ffffffff815ff3af>] sym_interrupt+0x1bff/0x1dd0 
[138539.201838] [<ffffffff8163e888>] ? e1000_clean+0x358/0x880 
[138539.201838] [<ffffffff815f8fc7>] sym53c8xx_intr+0x37/0x80 
[138539.201838] [<ffffffff8109fa78>] __handle_irq_event_percpu+0x38/0x1a0 
[138539.201838] [<ffffffff8109fbfe>] handle_irq_event_percpu+0x1e/0x50 
[138539.201838] [<ffffffff8109fc57>] handle_irq_event+0x27/0x50 
[138539.201838] [<ffffffff810a2b39>] handle_fasteoi_irq+0x89/0x160 
[138539.201838] [<ffffffff8101ea5e>] handle_irq+0x6e/0x120 
[138539.201838] [<ffffffff81079315>] ? atomic_notifier_call_chain+0x15/0x20 
[138539.201838] [<ffffffff8101e346>] do_IRQ+0x46/0xd0 
[138539.201838] [<ffffffff818dafff>] common_interrupt+0x7f/0x7f 
[138539.201838] <EOI> [138539.201838] [<ffffffff818d9e5b>] ? 
default_idle+0x1b/0xd0 
[138539.201838] [<ffffffff81025eea>] arch_cpu_idle+0xa/0x10 
[138539.201838] [<ffffffff818da22e>] default_idle_call+0x1e/0x30 
[138539.201838] [<ffffffff81097105>] cpu_startup_entry+0xd5/0x1c0 
[138539.201838] [<ffffffff8103cd98>] start_secondary+0xe8/0xf0 
[138539.201838] Shutting down cpus with NMI 
[138539.201838] Kernel Offset: disabled 
[138539.201838] ---[ end Kernel panic - not syncing: assertion "i && 
sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file 
"drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399 
----8<---- 

The dmesg output on the Proxmox nodes' does not show any issues during the 
times of these VM kernel panics. 

I appreciate any comments, questions, or some direction on this. 

Thank you, 

Bill 


-- 
Bill Arlofski 
Reverse Polarity, LLC 
http://www.revpol.com/blogs/waa 
------------------------------- 
He picks up scraps of information 
He's adept at adaptation 

--[ Not responsible for anything below this line ]-- 
_______________________________________________ 
pve-user mailing list 
pve-user at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user 




More information about the pve-user mailing list