[PVE-User] VMs With Multiple Interfaces Rebooting
Alwin Antreich
alwin at antreich.com
Mon Nov 25 06:32:16 CET 2024
On November 22, 2024 7:16:53 AM GMT+01:00, JR Richardson <jmr.richardson at gmail.com> wrote:
>Hey Folks,
>
>Just wanted to share an experience I recently had, Cluster parameters:
>7 nodes, 2 HA Groups (3 nodes and 4 nodes), shared storage.
>Server Specs:
>CPU(s) 40 x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2 Sockets)
>Kernel Version Linux 6.8.12-1-pve (2024-08-05T16:17Z)
>Manager Version pve-manager/8.2.4/faa83925c9641325
>
>Super stable environment for many years through software and hardware
>upgrades, few issues to speak of, then without warning one of my
>hypervisors in 3 node group crashed with a memory dimm error, cluster
>HA took over and restarted the VMs on the other two nodes in the group
>as expected. The problem quickly materialized as the VMs started
>rebooting quickly, a lot of network issues and notice of migration
>pending. I could not lockdown exactly what the root cause was. Notable
This sounds like it wanted to balance the load. Do you have CRS active and/or static load scheduling?
>was these particular VMs all have multiple network interfaces. After
>several hours of not being able to get the current VMs stable, I tried
>spinning up new VMs on to no avail, reboots persisted on the new VMs.
>This seemed to only affect the VMs that were on the hypervisor that
>failed all other VMs across the cluster were fine.
>
>I have not installed any third-party monitoring software, found a few
>post in the forum about it, but was not my issue.
>
>In an act of desperation, I performed a dist-upgrade and this solved
>the issue straight away.
>Kernel Version Linux 6.8.12-4-pve (2024-11-06T15:04Z)
>Manager Version pve-manager/8.3.0/c1689ccb1065a83b
The upgrade likely restarted the pve-ha-lrm service, which could break the migration cycle.
The systemd logs should give you a clue to what was happening, the ha stack logs the actions on the given node.
Cheers,
Alwin
Hi JR,
More information about the pve-user
mailing list