[PVE-User] HA Changes and Cleanups

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Nov 24 10:05:20 CET 2016


Hi all,

regarding the discussion about our HA stack on the pve-user list in October
we made some changes, which - hopefully - should address some problems and
reduce some common pitfalls.

* What has changed or is new:

pct shutdown / qm shutdown and the Shutdown button in the web interface work
now as expected, if triggered the HA service will be shut down and not
automatically started again. If that is needed there is still the 'reset'
functionality.

We provide now better feedback about the actual state of a HA service.
E.g. 'started' will be only shown if the local resource manager confirmed
that the service really started, else we show 'starting' so that it's
clearer whats currently happening.

We merged the GUI's 'Resource' tab into the 'HA' tab, related information is
now placed together. This should give a better overview of the current
situation.
Note, there are some fields in the resource grid which are hidden by
default, to show them click on one of the tiny triangles in the column
headers: https://i.imgsafe.org/6a271a3cc4.png

Improved the built in documentation.

We also reworked the request states for services, there is now:

* started (replaces 'enabled')
The CRM tries to start the resource. Service state is set to started
after successful start. On node failures, or when start fails, it tries
to recover the resource. If everything fails, service state it set to
error.

* stopped (new)
The CRM tries to keep the resource in stopped state, but it still
tries to relocate the resources on node failures.

* disabled
The CRM tries to put the resource in stopped state, but does not
try to relocate the resources on node failures. The main purpose
of this state is error recovery, because it is the only way to
move a resource out of the error state.


So the general used ones should be now 'started' and 'stopped', here its
clear what the HA stack will do.
'disabled' should be mainly used to recover a service which is in the error
state.

ha-manager enabled/disabled was removed, this was not in the API so it
should only affect user which called it directly.
You can use `ha-manager set SID --state REQUEST_STATE` instead.

* What has still to come:

A 'ignore' request state in which the service will not be touched by HA but
is still in the resource configuration - this was wished a few times.
I have WIP patches ready but nothing merged yet.

A bit less confusion on task execution logs.

Allowing hard stopping of a VM/CT under HA.

I hope this addresses some part of the feedback we got.
Many thanks to the community for the feedback and to Dietmar who did a lot
of the above mentioned work and also Dominik for his help with the UI.

User which want to test this changes can use the new packages we pushed to
pvetest yesterday evening CET.
The changes are include in the packages:
pve-ha-manager >= 1.0-38
pve-manager >= 4.3-11

Happy testing and feel free to provide feedback.

cheers,
Thomas




More information about the pve-user mailing list