[PVE-User] HA Fencing

Mark Adams mark at openvs.co.uk
Tue Dec 5 10:25:50 CET 2017


On 5 December 2017 at 08:52, Thomas Lamprecht <t.lamprecht at proxmox.com>
wrote:

> Hi,
>
> On 12/04/2017 07:51 PM, Mark Adams wrote:
> > On 17 November 2017 at 10:55, Thomas Lamprecht <t.lamprecht at proxmox.com>
> wrote:
> >> On 11/16/2017 07:20 PM, Mark Adams wrote:
> >>> Hi all,
> >>>
> >>> It looks like in newer versions of proxmox, the only fencing type
> advised
> >>> is watchdog. Is that the case?
> >>>
> >>
> >> Yes, since PVE 4.0 watchdog fencing is the norm.
> >> There is a patch set of mine which implements the use of external fence
> >> device,
> >> but it has seen no review. I should probably dust it up, look over it
> and
> >> re send
> >> it again, it's about time we finally get this feature.
> >>
> >
> > I think you should definitely get this feature in - I would even say it
> is
> > necessary for an enterprise HA setup?
> >
>
> Not really a necessary. Watchdog based fencing is no less secure than
> traditional
> fence devices. In fact, as there's much less to configure, and much less
> protocols
> between them I'd say its the opposite. I.e., you do not must fire up a
> command
> over TCP/IP to fence a node to a device. Here are multiple problem points,
> Link problems, high load problems delaying fencing, fence devices whit a
> setup not
> well tested, at least not under failure conditions, ...
> A watchdog, which triggers as soon as the node did not pulled it up,
> independent
> of link failures, cluster load is here the safer bet. They are often the
> norm in
> highly-secure critical embedded systems to, not without reason.
> It's the difference between a emergency shutdown button and a
> dead-man-switch.
>

AFAIK It's the only way to know for sure, that your server has actually
been fenced when it is not contactable by other means, For instance some
network issue on the host.

Yes the Watchdog on the machine that goes offline should fence itself, but
still the only way to know for sure that the machine is dead is to power it
off right?


> Maybe you didn't even meant the reliability stand point but that a better
> best-case SLA could be possible with fence devices?
>

This does make a difference too, it could fail over in seconds with faster
fencing.


>
> But nonetheless agreeing that we should really get it in. I'll try to
> pickup the
> series before this month ends, after the Cluster over API stuff got in.
>

Thanks it would be great to see it in.


>
> cheers,
> Thomas
>



More information about the pve-user mailing list