[pve-devel] [PATCH ha-manager 1/2] LRM: do not count erroneous service as active

Thomas Lamprecht t.lamprecht at proxmox.com
Fri Apr 29 17:23:58 CEST 2016

On 04/29/2016 04:49 PM, Dietmar Maurer wrote:
>> diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
>> index fcbc33f..b7a3201 100644
>> --- a/src/PVE/HA/LRM.pm
>> +++ b/src/PVE/HA/LRM.pm
>> @@ -180,6 +180,8 @@ sub active_service_count {
>>  	next if !defined($req_state);
>>  	next if $req_state eq 'stopped';
>>  	next if $req_state eq 'freeze';
>> +	# we mustn't touch erroneous services, so they are not active
>> +	next if $req_state eq 'error';
>>  	$count++;
>>      }
> I think it is not as easy, because we use active_service_count() from
> several places, and we want to count errors sometimes ....
> I am not 100% sure.

Yes, we use it in three places:
* when determining if we should try to acquire our agent_lock and make
the state transition from wait_for_agent_lock => active. Here its safe
as we cannot do anything with the erroneous states anyhow, and as the
only way to move out from error state is disabling the service (thus
marking it as stopped) has the result that it still is not counted as
active, as we skip stopped also, so this is safe and maybe also wanted.

* when restarting the LRM to see if are allowed to finish the restart,
i.e. if all services are either frozen ore stopped and with this patch
we would also allow services in error state which is the primary intent
of this patch => safe (as such services wouldn't get inactive in any way
resulting in a timeout and kill of the lrm)

* when shutting down a LRM in lost_agent_lock state, here it has the
effect that with this patch the node won't get shot by its watchdog if
it has erroneous services but not active ones, this is not only save but
wanted, IMO. It makes no sense letting the watchdog trigger when we have
no active services besides the one in the error state, as when
restarting it does not make anything better (error services won't get
touched after all).

Those are the three cases, if we say that error services should not get
touched this is safer than previously and wanted, IMO, but maybe its
better I sleep a weekend over it in case i forgot something. :)

Thanks for the reviews!

More information about the pve-devel mailing list