[PVE-User] HA scalability and predictability

Mon Dec 10 04:45:27 CET 2018

Hello,

still investigating PVE as a large ganeti cluster replacement. 

Some years ago we did our owh HA VM cluster based on Pacemaker, libvirt
(KVM) and DRBD. 
While this worked well it also showed the limitations in Pacemaker and LRM
in particular. Things got pretty sluggish with 60VMs and a total of 120
resources.
This cluster will have about 800VMs, has anybody done this number of HA
VMs with PVE and what's their experience?

Secondly, it is an absolute requirement that a node failure will result in
a predictable and restricted failover. 
I.e. the cluster will have a n+1 (or n+2) redundancy with at least one
node being essentially a hot spare.
Failovers should only go to the spare(s), never another compute node.

I presume a "node1:2,node8:1" and "restricted 1" should do the trick here.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Rakuten Communications