[pve-devel] [PATCH ha-manager 3/6] Env/PVE2: get_node_info: ensure quorate and actual info is used
Thomas Lamprecht
t.lamprecht at proxmox.com
Wed Nov 8 11:40:36 CET 2017
On 11/08/2017 07:01 AM, Dietmar Maurer wrote:
> Is this whole thing related to this patch:
>
> https://git.proxmox.com/?p=pve-cluster.git;a=commitdiff;h=7bac9ca573ad13f527663d27f1a9177279d69b76
>
> ?
>
Yes.
> More questions below:
>
>> On November 7, 2017 at 3:27 PM Thomas Lamprecht <t.lamprecht at proxmox.com>
>> wrote:
>>
>>
>> Do not trust member information if not quorate and if quorate ensure
>> member information is up do date.
>>
>> Signed-off-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
>> ---
>> src/PVE/HA/Env/PVE2.pm | 22 ++++++++++++----------
>> 1 file changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm
>> index 8baf2d0..2db56af 100644
>> --- a/src/PVE/HA/Env/PVE2.pm
>> +++ b/src/PVE/HA/Env/PVE2.pm
>> @@ -177,17 +177,19 @@ sub get_node_info {
>>
>> my ($node_info, $quorate) = ({}, 0);
>>
>> + if (PVE::Cluster::check_cfs_quorum(1)) {
>> + $quorate = 1;
>> +
>> + PVE::Cluster::cfs_update();
>
> Why? We do the update in loop_start_hook()
>> IMHO this should return all information available, even if we are not quorate.
> You need to decide if you trust that somewhere else. I think about something
> like this:
>
A restart of pmxcfs hppening shortly before the loop_start_hook
could cause the status to be an empty and this was an (non-ideal)
solution to make it less likely.
Maybe we should record if the cfs_update did not worked and go
into an lost lock state if this happens? No point in doing
any "real work" if the status is missing?
> diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
> index 25a7398..57410cb 100644
> --- a/src/PVE/HA/Manager.pm
> +++ b/src/PVE/HA/Manager.pm
> @@ -361,7 +361,12 @@ sub manage {
>
> my ($haenv, $ms, $ns, $ss) = ($self->{haenv}, $self->{ms}, $self->{ns},
> $self->{ss});
>
> - $ns->update($haenv->get_node_info());
> + my ($node_info, $quorate) = $haenv->get_node_info();
> + if (!$quorate) {
> + $haenv->log('info', "master lost quorum"); # fixme: I am not sure what
> to log here
> + return;
> + }
> + $ns->update($node_info);
>
> if (!$ns->node_is_online($haenv->nodename())) {
> $haenv->log('info', "master seems offline");
>
More information about the pve-devel
mailing list