[PVE-User] HA Changes and Cleanups

mj lists at merit.unu.edu
Thu Nov 24 10:22:11 CET 2016


Hi Thomas,

Thank you for these improvements.

(I did not participate much in the following discussion, but I was the 
one who started the thread "[PVE-User] HA question")

MJ

On 11/24/2016 10:05 AM, Thomas Lamprecht wrote:
> Hi all,
>
> regarding the discussion about our HA stack on the pve-user list in October
> we made some changes, which - hopefully - should address some problems and
> reduce some common pitfalls.
>
> * What has changed or is new:
>
> pct shutdown / qm shutdown and the Shutdown button in the web interface
> work
> now as expected, if triggered the HA service will be shut down and not
> automatically started again. If that is needed there is still the 'reset'
> functionality.
>
> We provide now better feedback about the actual state of a HA service.
> E.g. 'started' will be only shown if the local resource manager confirmed
> that the service really started, else we show 'starting' so that it's
> clearer whats currently happening.
>
> We merged the GUI's 'Resource' tab into the 'HA' tab, related
> information is
> now placed together. This should give a better overview of the current
> situation.
> Note, there are some fields in the resource grid which are hidden by
> default, to show them click on one of the tiny triangles in the column
> headers: https://i.imgsafe.org/6a271a3cc4.png
>
> Improved the built in documentation.
>
> We also reworked the request states for services, there is now:
>
> * started (replaces 'enabled')
> The CRM tries to start the resource. Service state is set to started
> after successful start. On node failures, or when start fails, it tries
> to recover the resource. If everything fails, service state it set to
> error.
>
> * stopped (new)
> The CRM tries to keep the resource in stopped state, but it still
> tries to relocate the resources on node failures.
>
> * disabled
> The CRM tries to put the resource in stopped state, but does not
> try to relocate the resources on node failures. The main purpose
> of this state is error recovery, because it is the only way to
> move a resource out of the error state.
>
>
> So the general used ones should be now 'started' and 'stopped', here its
> clear what the HA stack will do.
> 'disabled' should be mainly used to recover a service which is in the error
> state.
>
> ha-manager enabled/disabled was removed, this was not in the API so it
> should only affect user which called it directly.
> You can use `ha-manager set SID --state REQUEST_STATE` instead.
>
> * What has still to come:
>
> A 'ignore' request state in which the service will not be touched by HA but
> is still in the resource configuration - this was wished a few times.
> I have WIP patches ready but nothing merged yet.
>
> A bit less confusion on task execution logs.
>
> Allowing hard stopping of a VM/CT under HA.
>
> I hope this addresses some part of the feedback we got.
> Many thanks to the community for the feedback and to Dietmar who did a lot
> of the above mentioned work and also Dominik for his help with the UI.
>
> User which want to test this changes can use the new packages we pushed to
> pvetest yesterday evening CET.
> The changes are include in the packages:
> pve-ha-manager >= 1.0-38
> pve-manager >= 4.3-11
>
> Happy testing and feel free to provide feedback.
>
> cheers,
> Thomas
>
>
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user



More information about the pve-user mailing list