[pve-devel] [PATCH cluster] fix #3596: handle delnode of offline node

Fabian Ebner f.ebner at proxmox.com
Fri Nov 12 12:50:46 CET 2021


Am 12.11.21 um 09:45 schrieb Fabian Grünbichler:
> the recommended way is to first shutdown, then delnode, and never let it
> come back online, in which case corosync-cfgtool won't be able to kill
> the removed (offline) node.
> 
> also, the order was wrong - if we first update corosync.conf to remove
> the node entry from the nodelist, corosync doesn't know about the nodeid
> anymore, so killing will fail even if the node is still online.
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> ---
>   data/PVE/API2/ClusterConfig.pm | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/data/PVE/API2/ClusterConfig.pm b/data/PVE/API2/ClusterConfig.pm
> index 8f4a5bb..5a6a1ac 100644
> --- a/data/PVE/API2/ClusterConfig.pm
> +++ b/data/PVE/API2/ClusterConfig.pm
> @@ -485,9 +485,13 @@ __PACKAGE__->register_method ({
>   
>   	    delete $nodelist->{$node};
>   
> -	    PVE::Corosync::update_nodelist($conf, $nodelist);
> +	    # allowed to fail when node is already shut down!
> +	    eval {
> +		PVE::Tools::run_command(['corosync-cfgtool','-k', $nodeid])
> +		    if defined($nodeid);
> +	    };
>   

But what if it fails for a different reason than 'CS_ERR_NOT_EXIST'? 
Shouldn't we match the error?

> -	    PVE::Tools::run_command(['corosync-cfgtool','-k', $nodeid]) if defined($nodeid);
> +	    PVE::Corosync::update_nodelist($conf, $nodelist);
>   	};
>   
>   	$config_change_lock->($code);
> 





More information about the pve-devel mailing list