[PVE-User] Quorum lost cos of storage backbone problems

Tue Mar 28 11:33:20 CEST 2017

Hi,

The Cluster network is a separate network. So I would follow your idea of overload based on many writes to a not reachable NFS Server.
In the meantime we were able to reboot the node and everything looks better know. The VMs were restarted so.

Next time I give the 
	systemctl restart corosync pve-cluster
a try. This hopefully will not reset any running vms.

Immo
-----Original Message-----
From: pve-user [mailto:pve-user-bounces at pve.proxmox.com] On Behalf Of Thomas Lamprecht
Sent: Tuesday, March 28, 2017 7:57 AM
To: pve-user at pve.proxmox.com
Subject: Re: [PVE-User] Quorum lost cos of storage backbone problems

Hi,

On 03/23/2017 05:20 PM, IMMO WETZEL wrote:
> Our Storage backbone had some problems during this, one of the nodes lost his quorum, may be cos of many vms at this host had lot of NFS mounted disks.
> How can I bring back the host into the Cluster without rebooting?

Was the cluster network on the storage backbone (I assume you mean network here)?

If not, the loss could be the result of heavy load on the node resulting from the outage.
Else this would be weird, as quorum does not depend directly on the running (or failed) VMs.

I'd check if the problematic Node can send to the other nodes via multicast [1] then restarting corosync and eventually pve-cluster should do it:

systemctl restart corosync pve-cluster

cheers,
Thomas

[1]
http://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

_______________________________________________
pve-user mailing list
pve-user at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user