[pve-devel] [PATCH docs] ha-manager: error fixes and small additions

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Apr 29 10:51:03 CEST 2016



On 04/29/2016 08:11 AM, Dietmar Maurer wrote:
> applied, but have some suggestions:
>
>> +Updates
>> +~~~~~~~
>> +When updating the ha-manager you should do one node after the other, never
>> +all at once. 
> Why exactly? It would be interesting to know such things, so please explain.

Ok, I send an additional patch which covers an explanation.
It boils down to that we freeze services and thus a manager needs to be
available,
If now the manager restarts at first the other could wait for a new,
that problems
was greatly improved with the crm lock release on restart patches but
still can be problematic.
In the past (without the lock release patches) it could even trigger a
systemd timeout on restart
thus killing the crm and as a cause of that triggering a node reset,
this does not happen anymore,
but we probably should make the timeout kills from systemd longer for
this unit, I'll give that a more specific look.



>
>> Further you have to ensure that no service located at the node
>> +is in the error state, a node with erroneous service is not able to be
>> upgraded
>> +and if tried nonetheless it may even trigger a Node reset when doing so!
> Why is that not possible? Looks like a bug to me?


Services in error state are not freezable, thus the LRM wont
restart/release its lock/close the watchdog => problematic

Services which are in the error state cannot be touched, as its unclear
what happened really with them
 (or at least we can not recover them) and they need manual intervention.

We could although don't count them towards the active services in the
LRM, so the LRM could restart with those and they would stay in the
error state, i.e. untouched.
This seams like a reasonable idea for me, thoughts?

>
>> +When dealing with erroneous services first check what happened to them, then
>> +bring them in a secure state, after that disable or remove them from HA. 
>> +Only after that you may start upgrading a Nodes LRM and CRM.
>> +
>>  Fencing
>>  -------
>>  
>> -- 
>> 2.1.4
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>




More information about the pve-devel mailing list