[pve-devel] [PATCH ha-manager 3/6] Env/PVE2: get_node_info: ensure quorate and actual info is used

Thomas Lamprecht t.lamprecht at proxmox.com
Wed Nov 8 11:40:36 CET 2017


On 11/08/2017 07:01 AM, Dietmar Maurer wrote:
> Is this whole thing related to this patch:
> 
> https://git.proxmox.com/?p=pve-cluster.git;a=commitdiff;h=7bac9ca573ad13f527663d27f1a9177279d69b76
> 
> ?
> 

Yes.

> More questions below:
> 
>> On November 7, 2017 at 3:27 PM Thomas Lamprecht <t.lamprecht at proxmox.com>
>> wrote:
>>
>>
>> Do not trust member information if not quorate and if quorate ensure
>> member information is up do date.
>>
>> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
>> ---
>>  src/PVE/HA/Env/PVE2.pm | 22 ++++++++++++----------
>>  1 file changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
>> index 8baf2d0..2db56af 100644
>> --- a/src/PVE/HA/Env/PVE2.pm
>> +++ b/src/PVE/HA/Env/PVE2.pm
>> @@ -177,17 +177,19 @@ sub get_node_info {
>>  
>>      my ($node_info, $quorate) = ({}, 0);
>>  
>> +    if (PVE::Cluster::check_cfs_quorum(1)) {
>> +	$quorate = 1;
>> +
>> +	PVE::Cluster::cfs_update();
> 
> Why? We do the update in loop_start_hook()
>> IMHO this should return all information available, even if we are not quorate.
> You need to decide if you trust that somewhere else. I think about something
> like this:
> 

A restart of pmxcfs hppening shortly before the loop_start_hook
could cause the status to be an empty and this was an (non-ideal)
solution to make it less likely.

Maybe we should record if the cfs_update did not worked and go
into an lost lock state if this happens? No point in doing
any "real work" if the status is missing?

> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 25a7398..57410cb 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -361,7 +361,12 @@ sub manage {
> 
>      my ($haenv, $ms, $ns, $ss) = ($self->{haenv}, $self->{ms}, $self->{ns},
> $self->{ss});
> 
> -    $ns->update($haenv->get_node_info());
> +    my ($node_info, $quorate) = $haenv->get_node_info();
> +    if (!$quorate) {
> +       $haenv->log('info', "master lost quorum"); # fixme: I am not sure what
> to log here
> +       return;
> +    }
> +    $ns->update($node_info);
> 
>      if (!$ns->node_is_online($haenv->nodename())) {
>         $haenv->log('info', "master seems offline");
> 





More information about the pve-devel mailing list