[pve-devel] [RFC ha-manager v2 5/7] allow LRM lock stealing for fenced nodes

Thomas Lamprecht t.lamprecht at proxmox.com
Sat Mar 12 21:41:37 CET 2016



On 03/12/2016 01:39 PM, Dietmar Maurer wrote:
>> We are only allowed to recover (=steal) a service when we have its
>> LRMs lock, as this guarantees us that even if said LRM comes up
>> again during the steal operation the LRM cannot start the services
>> when the service config still belongs to it for a short time.
>>
>> This is important, else we have a possible race for the resource
>> which can result in a service started on the old (restarted) node
>> and the node where the service was recovered too, which is really
>> bad!
> I don't really understand that. Wouldn't it be safer to simply wait
> for the LRM lock after fencing?

This has the disadvantages that we loose the "faster fencing feature",
not really a big issue but it would be nice to have.

Also removing the lock from the fenced node and then acquiring it
"legally" is safe and also needed, imo.

Also I have something in the head that i ran into issues when setting
the fence agent in reset mode where the fenced node came back online
after a few seconds and got the lock when we were in the middle of the
recover process, that would be bad (even if its a corner case with low
probability).
I'll test that explicitly on Monday :)




More information about the pve-devel mailing list