[PVE-User] VMs With Multiple Interfaces Rebooting

Wed Nov 27 10:38:59 CET 2024

Hi JR,

November 25, 2024 at 4:08 PM, "JR Richardson" <jmr.richardson at gmail.com mailto:jmr.richardson at gmail.com?to=%22JR%20Richardson%22%20%3Cjmr.richardson%40gmail.com%3E > wrote:

> 
> > 
> > Super stable environment for many years through software and hardware
> > upgrades, few issues to speak of, then without warning one of my
> > hypervisors in 3 node group crashed with a memory dimm error, cluster
> > HA took over and restarted the VMs on the other two nodes in the group
> > as expected. The problem quickly materialized as the VMs started
> > rebooting quickly, a lot of network issues and notice of migration
> > pending. I could not lockdown exactly what the root cause was. Notable
> >  This sounds like it wanted to balance the load. Do you have CRS active and/or static load scheduling?
> > 
> CRS option is set to basic, not dynamic.
K, basic. And I meant is rebalance active. :)

> 
> 2024-11-21T18:37:38.248094-06:00 vvepve13 pve-ha-lrm[4337]: <root at pam>
> end task UPID:vvepve13:000010F2:00007AEA:673FD24A:qmstart:13101:root at pam:
> OK
> 2024-11-21T18:37:38.254144-06:00 vvepve13 pve-ha-lrm[4337]: service
> status vm:13101 started
> 2024-11-21T18:37:44.256824-06:00 vvepve13 QEMU[3794]: kvm:
> ../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret
> == 0' failed.
This doesn't look good. I'd assume that this is VM13101, which failed to start. And was consequently moved to the other remaining node (vice versa).

But this doesn't explain the WHY. You will need to look further into the logs to see what else transpired during this time.

Cheers,
Alwin