[PVE-User] VMs With Multiple Interfaces Rebooting

JR Richardson jmr.richardson at gmail.com
Fri Nov 22 17:59:03 CET 2024


Hi Mark,

Found this error during log review:
" vvepve13 pvestatd[1468]: VM 13113 qmp command failed - VM 13113 qmp
command 'query-proxmox-support' failed - unable to connect to VM 13113 qmp
socket - timeout after 51 retries"

HA was sending shutdown to the VM after not being able to verify VM was
running. I initially through this was networking related but as I
investigate further, this seems like a bug in 'qm', so strange, been running
on this version for months, doing migrations and spinning up new VMs without
any issues.
Thanks
JR


Hi JR,

What do you mean by ?reboot?? Does the vm crash so that it is powered down
from a HA point of view and started back up? Or does the VM OS nicely
reboot?


Mark Schouten

> Op 22 nov 2024 om 07:18 heeft JR Richardson <jmr.richardson at gmail.com> het
volgende geschreven:
> 
> ?Hey Folks,
> 
> Just wanted to share an experience I recently had, Cluster parameters:
> 7 nodes, 2 HA Groups (3 nodes and 4 nodes), shared storage.
> Server Specs:
> CPU(s) 40 x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2 Sockets) 
> Kernel Version Linux 6.8.12-1-pve (2024-08-05T16:17Z) Manager Version 
> pve-manager/8.2.4/faa83925c9641325
> 
> Super stable environment for many years through software and hardware 
> upgrades, few issues to speak of, then without warning one of my 
> hypervisors in 3 node group crashed with a memory dimm error, cluster 
> HA took over and restarted the VMs on the other two nodes in the group 
> as expected. The problem quickly materialized as the VMs started 
> rebooting quickly, a lot of network issues and notice of migration 
> pending. I could not lockdown exactly what the root cause was. Notable 
> was these particular VMs all have multiple network interfaces. After 
> several hours of not being able to get the current VMs stable, I tried 
> spinning up new VMs on to no avail, reboots persisted on the new VMs.
> This seemed to only affect the VMs that were on the hypervisor that 
> failed all other VMs across the cluster were fine.
> 
> I have not installed any third-party monitoring software, found a few 
> post in the forum about it, but was not my issue.
> 
> In an act of desperation, I performed a dist-upgrade and this solved 
> the issue straight away.
> Kernel Version Linux 6.8.12-4-pve (2024-11-06T15:04Z) Manager Version 
> pve-manager/8.3.0/c1689ccb1065a83b
> 
> Hope this was helpful and if there are any ideas on why this happened, 
> I welcome any responses.
> 
> Thanks.
> 
> JR





More information about the pve-user mailing list