[pve-devel] [RFC ha-manager 3/3] always fence nodes on dead LRM

Dietmar Maurer dietmar at proxmox.com
Fri Apr 29 08:30:08 CEST 2016



> On April 26, 2016 at 10:55 AM Thomas Lamprecht <t.lamprecht at proxmox.com>
> wrote:
> 
> 
> fixes a recovery failure if a node starts up with a dead/broken LRM
> but working corosync.
> 
> So while its quorate it doesn't do anything but the CRM won't fence
> it as our "last_online" timestamp only checks if quorate, not if the
> HA manager is actually working.
> 
> Can be reproduced with having a active node with services, simply
> disable the lrm:
> $ systemctl disable pve-ha-lrm
> and then reboot.
> (this would simulate a broken update/reboot)
> So the node gets up again and gains quorum but the LRM does not
> start and thus no service gets started/migrated/... fencing is
> appropriate for such a situation.

Why? And how does it solve the problem? Seems to end in an endless
reboot cycle?



More information about the pve-devel mailing list