[PVE-User] Random kernel panics of my KVM VMs

Tue Aug 15 20:54:48 CEST 2017

Have you tried to change the VM scsi controller so something different than
LSI? Does that help?

Yannis

On Tue, 15 Aug 2017 at 08:02, Bill Arlofski <waa-pveuser at revpol.com> wrote:

>
> Hello everyone.
>
> I am not sure this is the right place to ask, but I am also not sure where
> to
> start, so this list seemed like a good place. I am happy for any direction
> as
> to the best place to turn to for a solution. :)
>
> For quite some time now I have been having random kernel panics on random
> VMs.
>
> I have a two-node cluster, currently running a pretty current PVE version:
>
> PVE Manager Version pve-manager/5.0-23/af4267bf
>
> Now, these kernel panics have continued through several VM kernel upgrades,
> and even continue after the 4.x to 5.x Proxmox upgrade several weeks ago.
> In
> addition, I have moved VMs from one Proxmox node to the other to no avail,
> ruling out hardware on one node or the other.
>
> Also, it does not matter if the VMs have their (QCOW2) disks on the Proxmox
> node's local hardware RAID storage or the Synology NFS-connected storage
>
> I am trying to verify this by moving a few VMs that seem to panic more
> often
> than others back to some local hardware RAID storage on one node as I write
> this email...
>
> Typically the kernel panics occur during the nightly backups of the VMs,
> but I
> cannot say that this is always when they occur. I _can_ say that the kernel
> panic always reports the sym53c8xx_2 module as the culprit though...
>
> I have set up remote kernel logging on one VM and here is the kernel panic
> reported:
>
> ----8<----
> [138539.201838] Kernel panic - not syncing: assertion "i &&
> sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file
> "drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399
> [138539.201838]
> [138539.201838] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.34-gentoo #5
> [138539.201838] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [138539.201838]  ffff88023fd03d90 ffffffff813a2408 ffff8800bb842700
> ffffffff81c51450
> [138539.201838]  ffff88023fd03e10 ffffffff8111ff3f ffff880200000020
> ffff88023fd03e20
> [138539.201838]  ffff88023fd03db8 ffffffff813c70f3 ffffffff81c517b0
> ffffffff81c51400
> [138539.201838] Call Trace:
> [138539.201838]  <IRQ> [138539.201838]  [<ffffffff813a2408>]
> dump_stack+0x4d/0x65
> [138539.201838]  [<ffffffff8111ff3f>] panic+0xca/0x203
> [138539.201838]  [<ffffffff813c70f3>] ? swiotlb_unmap_sg_attrs+0x43/0x60
> [138539.201838]  [<ffffffff815ff3af>] sym_interrupt+0x1bff/0x1dd0
> [138539.201838]  [<ffffffff8163e888>] ? e1000_clean+0x358/0x880
> [138539.201838]  [<ffffffff815f8fc7>] sym53c8xx_intr+0x37/0x80
> [138539.201838]  [<ffffffff8109fa78>] __handle_irq_event_percpu+0x38/0x1a0
> [138539.201838]  [<ffffffff8109fbfe>] handle_irq_event_percpu+0x1e/0x50
> [138539.201838]  [<ffffffff8109fc57>] handle_irq_event+0x27/0x50
> [138539.201838]  [<ffffffff810a2b39>] handle_fasteoi_irq+0x89/0x160
> [138539.201838]  [<ffffffff8101ea5e>] handle_irq+0x6e/0x120
> [138539.201838]  [<ffffffff81079315>] ?
> atomic_notifier_call_chain+0x15/0x20
> [138539.201838]  [<ffffffff8101e346>] do_IRQ+0x46/0xd0
> [138539.201838]  [<ffffffff818dafff>] common_interrupt+0x7f/0x7f
> [138539.201838]  <EOI> [138539.201838]  [<ffffffff818d9e5b>] ?
> default_idle+0x1b/0xd0
> [138539.201838]  [<ffffffff81025eea>] arch_cpu_idle+0xa/0x10
> [138539.201838]  [<ffffffff818da22e>] default_idle_call+0x1e/0x30
> [138539.201838]  [<ffffffff81097105>] cpu_startup_entry+0xd5/0x1c0
> [138539.201838]  [<ffffffff8103cd98>] start_secondary+0xe8/0xf0
> [138539.201838] Shutting down cpus with NMI
> [138539.201838] Kernel Offset: disabled
> [138539.201838] ---[ end Kernel panic - not syncing: assertion "i &&
> sym_get_cam_status(cp->cmd) == DID_SOFT_ERROR" failed: file
> "drivers/scsi/sym53c8xx_2/sym_hipd.c", line 3399
> ----8<----
>
> The dmesg output on the Proxmox nodes' does not show any issues during the
> times of these VM kernel panics.
>
> I appreciate any comments, questions, or some direction on this.
>
> Thank you,
>
> Bill
>
>
> --
> Bill Arlofski
> Reverse Polarity, LLC
> http://www.revpol.com/blogs/waa
> -------------------------------
> He picks up scraps of information
> He's adept at adaptation
>
> --[ Not responsible for anything below this line ]--
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
-- 
Sent from Gmail Mobile