[pve-devel] [PATCH cluster] fix #3596: handle delnode of offline node

Fabian Ebner f.ebner at proxmox.com
Fri Nov 12 13:46:24 CET 2021


Am 12.11.21 um 13:14 schrieb Thomas Lamprecht:
> On 12.11.21 12:50, Fabian Ebner wrote:
>> Am 12.11.21 um 09:45 schrieb Fabian Grünbichler:
>>> the recommended way is to first shutdown, then delnode, and never let it
>>> come back online, in which case corosync-cfgtool won't be able to kill
>>> the removed (offline) node.
>>>
>>> also, the order was wrong - if we first update corosync.conf to remove
>>> the node entry from the nodelist, corosync doesn't know about the nodeid
>>> anymore, so killing will fail even if the node is still online.
>>>
>>> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
>>> ---
>>>    data/PVE/API2/ClusterConfig.pm | 8 ++++++--
>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/data/PVE/API2/ClusterConfig.pm b/data/PVE/API2/ClusterConfig.pm
>>> index 8f4a5bb..5a6a1ac 100644
>>> --- a/data/PVE/API2/ClusterConfig.pm
>>> +++ b/data/PVE/API2/ClusterConfig.pm
>>> @@ -485,9 +485,13 @@ __PACKAGE__->register_method ({
>>>              delete $nodelist->{$node};
>>>    -        PVE::Corosync::update_nodelist($conf, $nodelist);
>>> +        # allowed to fail when node is already shut down!
>>> +        eval {
>>> +        PVE::Tools::run_command(['corosync-cfgtool','-k', $nodeid])
>>> +            if defined($nodeid);
>>> +        };
>>>    
>>
>> But what if it fails for a different reason than 'CS_ERR_NOT_EXIST'? Shouldn't we match the error?
> 
> at least that examples is like ENOENT on unlink, an OK error (user could
> have -k'illed it before that).
> 

My example is when it's *not* that error ;)
With the patch we treat all errors as OK.





More information about the pve-devel mailing list