[PVE-User] Expected fencing behavior on a bifurcated 4-node HA cluster

Adam Carheden carheden at ucar.edu
Wed May 3 17:29:27 CEST 2017


On 05/03/2017 01:46 AM, Alexandre DERUMIER wrote:
> Maybe this is because node only reboot if an HA vm is present on the node ?
> 
> 
> if you had HA vm on all 4 nodes, I think that all nodes should be reboot by watchdog. (as you lost quorum on 4 nodes)
That must be it. I have one HA VM and a few non-HA VMs, all just
testing. I think the takeaway is not to run an even number of HA nodes
in production (or to use a corosync QDevice as Thomas suggests).

Am I correct that all PVE nodes contribute to the quorum voting even if
they're not part of an HA group?

My production cluster will have 6 nodes (more redundancy, same
datacetner, less network risk). To prevent cluster shutdown in
production when I'll have lots more HA VMs, I can just add an old
cheap-o box as 7th node for quorum and not put it in any HA groups?

Alternatively, is there a way exclude one of my 6 nodes from the HA
quorum voting? In a CEPH cluster, quorum is determined by nodes running
the monitor service, and not all nodes have to run the monitor service.
Is there an equivalent "no monitor" configuration in PVE?

Thanks

> 
> ----- Mail original -----
> De: "Adam Carheden" <carheden at ucar.edu>
> À: "proxmoxve" <pve-user at pve.proxmox.com>
> Envoyé: Mardi 2 Mai 2017 17:40:37
> Objet: [PVE-User] Expected fencing behavior on a bifurcated 4-node HA	cluster
> 
> What's supposed to happen if two nodes in a 4-node HA cluster go offline? 
> 
> 
> I have a 4-node test cluster, two nodes are in one server room and the 
> other two in another server room. I had HA inadvertently tested for me 
> this morning due to an unexpected network issue and watchdog rebooted 
> two of the nodes. 
> 
> I think this is the expected behavior, and certainly seems like what I 
> want to happen. However, quorum is 3, not 2, so why didn't all 4 nodes 
> reboot? 
> 
> # pvecm status 
> Quorum information 
> ------------------ 
> Date: Tue May 2 09:35:23 2017 
> Quorum provider: corosync_votequorum 
> Nodes: 4 
> Node ID: 0x00000001 
> Ring ID: 4/524 
> Quorate: Yes 
> 
> Votequorum information 
> ---------------------- 
> Expected votes: 4 
> Highest expected: 4 
> Total votes: 4 
> Quorum: 3 
> Flags: Quorate 
> 
> Membership information 
> ---------------------- 
> Nodeid Votes Name 
> 0x00000004 1 192.168.0.11 
> 0x00000003 1 192.168.0.203 
> 0x00000001 1 192.168.0.204 (local) 
> 0x00000002 1 192.168.0.206 
> 
> # ha-manager status 
> quorum OK 
> master node3 (active, Tue May 2 09:35:24 2017) 
> lrm node1 (idle, Tue May 2 09:35:27 2017) 
> lrm node2 (active, Tue May 2 09:35:26 2017) 
> lrm node3 (idle, Tue May 2 09:35:23 2017) 
> lrm node3 (idle, Tue May 2 09:35:23 2017) 
> 
> Somehow proxmox was smart enough to keep two of the nodes online, but 
> with a quorum of 3 neither group should have had quorum. How does it 
> decide which group to keep online? 
> 
> Thanks 
> 



More information about the pve-user mailing list