[pve-devel] [PATCH v2 cluster 3/5] Add verification and fallback to cluster join/addnode

Thomas Lamprecht t.lamprecht at proxmox.com
Tue Dec 17 07:09:56 CET 2019


On 12/4/19 3:04 PM, Stefan Reiter wrote:
> Verify that the config of the new node is valid and compatible with the
> cluster (i.e. that the links for the new node match the currently
> configured nodes).
> 
> Additionally, fallback is provided via a new parameter to addnode,
> 'new_node_ip'. Previously, fallback was handled on the joining node, by
> setting it's local IP as 'link0', however, a cluster with only one link,
> but numbered 1-7 is still valid, and a fallback is possible, but the old
> code would now fail.
> 
> Instead, pass the locally resolved IP via a seperate parameter
> (resolving the IP on the cluster side is impractical, as IP resolution
> could fail or provide a wrong IP for Corosync).
> 
> For compatibility reasons, allow fallback to occur via the old link0
> method as well, but mark with FIXME for future removal.
> 
> Fallback fails in case the cluster has more than one link, in this case
> only the user can know which NIC/IP corresponds to which cluster link.
> 
> Signed-off-by: Stefan Reiter <s.reiter at proxmox.com>
> ---
>  data/PVE/API2/ClusterConfig.pm | 54 +++++++++++++++++++++++++++++++++-
>  data/PVE/CLI/pvecm.pm          |  6 ++++
>  data/PVE/Cluster/Setup.pm      |  6 ++++
>  3 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/data/PVE/API2/ClusterConfig.pm b/data/PVE/API2/ClusterConfig.pm
> index 2c5bb03..10dcd6d 100644
> --- a/data/PVE/API2/ClusterConfig.pm
> +++ b/data/PVE/API2/ClusterConfig.pm
> @@ -207,6 +207,12 @@ __PACKAGE__->register_method ({
>  		description => "Do not throw error if node already exists.",
>  		optional => 1,
>  	    },
> +	    new_node_ip => {
> +		type => 'string',
> +		description => "IP Address of node to add. Used as fallback if no links are given.",
> +		format => 'ip',
> +		optional => 1,
> +	    },
>  	}),
>      },
>      returns => {
> @@ -268,6 +274,50 @@ __PACKAGE__->register_method ({
>  		$check_duplicate_addr->($link);
>  	    }
>  
> +	    # Make sure that the newly added node has links compatible with the
> +	    # rest of the cluster. To start, extract all links that currently
> +	    # exist. Check any node, all have the same links here (because of
> +	    # verify_conf above).
> +	    my $node_options = (values %$nodelist)[0];
> +	    my $cluster_links = {};
> +	    foreach my $opt (keys %$node_options) {
> +		next if $opt !~ $PVE::Corosync::link_addr_re;
> +		$cluster_links->{$2} = $node_options->{$opt};
> +	    }
> +
> +	    if (%$links) {
> +		# verify specified links exist and none are missing
> +		for my $linknum (0..$PVE::Corosync::max_link_iter) {
> +		    my $have_cluster = defined($cluster_links->{$linknum});
> +		    my $have_new = defined($links->{$linknum});

$have_cluster_link
$have_new_link (or _node_ ?) 

else it's slightly confusing, i.e., as you'd check for if there's a cluster

> +
> +		    die "corosync: link $linknum exists in cluster config but wasn't specified for new node\n"
> +			if $have_cluster && !$have_new;
> +		    die "corosync: link $linknum specified for new node doesn't exist in cluster config\n"
> +			if !$have_cluster && $have_new;
> +		}
> +	    } else {
> +		# FIXME: Probably a good idea to remove this once we're sure
> +		# noone would join a node old enough to need this to a recent
> +		# cluster

# FIXME: remove in 8.0 or 7.0 if we need to switch clustering tech yet again ;)

> +		$param->{new_node_ip} = $cluster_links->{0}->{address}
> +		    if !$param->{new_node_ip} && $cluster_links->{0}
> +		    && scalar(%$cluster_links) == 1;

<insert post-if for multiple line spanning statement nitpick-stanza here>

> +
> +		# when called without any link parameters, fall back to
> +		# new_node_ip, if all existing nodes only have a single link too
> +		die "no links and no fallback ip (new_node_ip/link0) given, cannot join cluster\n"
> +		    if !$param->{new_node_ip};

but this and the other branch of the if above does not covers the:

* single link passed (e.g., the 0 one)
* that one isn't in the config
* but there's just a single link defined in the config (on another ID)
  with fitting network
-> magically switch the passed ID to the config ID


> +
> +		if (%$cluster_links == 1) {

please use scalar() to be more explicit, or even scalar(keys %$cluster_links) ?

> +		    my $linknum = (keys %$cluster_links)[0];
> +		    $links->{$linknum} = { address => $param->{new_node_ip} };
> +		} else {
> +		    die "no links given but multiple links found on other nodes,"
> +		      . " fallback only supported on single-link clusters\n";
> +		}
> +	    }
> +
>  	    if (defined(my $res = $nodelist->{$name})) {
>  		$param->{nodeid} = $res->{nodeid} if !$param->{nodeid};
>  		$param->{votes} = $res->{quorum_votes} if !defined($param->{votes});
> @@ -494,7 +544,9 @@ __PACKAGE__->register_method ({
>      path => 'join',
>      method => 'POST',
>      protected => 1,
> -    description => "Joins this node into an existing cluster.",
> +    description => "Joins this node into an existing cluster. If no links are"
> +		 . " given, default to IP resolved by node's hostname on single"
> +		 . " link (fallback fails for clusters with multiple links).",
>      parameters => {
>  	additionalProperties => 0,
>  	properties => PVE::Corosync::add_corosync_link_properties({
> diff --git a/data/PVE/CLI/pvecm.pm b/data/PVE/CLI/pvecm.pm
> index df0b1b9..364d9b7 100755
> --- a/data/PVE/CLI/pvecm.pm
> +++ b/data/PVE/CLI/pvecm.pm
> @@ -401,6 +401,12 @@ __PACKAGE__->register_method ({
>  		    $links->{$link}, get_standard_option('corosync-link'));
>  	    }
>  
> +	    # this will be used as fallback if no links are specified
> +	    push @$cmd, '--new_node_ip', $local_ip_address;
> +

you can only add that after 5/5 + guarding, else this breaks joining to a slightly
outdated node, which I want really to avoid.. If it'd be a user select-able switch
which triggers this, OK, but plainly adding a parameter isn't.

For the ssh way you could always fallback to just link0 if you implement the logic
I already suggested.

> +	    print "No local links given, will attempt fallback to $local_ip_address\n"
> +		if !%$links;

"No cluster network links passed explicitly, fallback to local node IP '$local_ip_address'\n"

> +
>  	    if (system (@$cmd) != 0) {
>  		my $cmdtxt = join (' ', @$cmd);
>  		die "unable to add node: command failed ($cmdtxt)\n";
> diff --git a/data/PVE/Cluster/Setup.pm b/data/PVE/Cluster/Setup.pm
> index 71a637f..8aa7f4b 100644
> --- a/data/PVE/Cluster/Setup.pm
> +++ b/data/PVE/Cluster/Setup.pm
> @@ -686,6 +686,12 @@ sub join {
>  	$args->{"link$link"} = PVE::Corosync::print_corosync_link($links->{$link});
>      }
>  
> +    # this will be used as fallback if no links are specified
> +    $args->{new_node_ip} = $local_ip_address;

same as above, needs to be guarded somehow.

> +
> +    print "No local links given, will attempt fallback to $local_ip_address\n"
> +	if !%$links;

same as above for the error message

> +
>      print "Request addition of this node\n";
>      my $res = eval { $conn->post("/cluster/config/nodes/$nodename", $args); };
>      if (my $err = $@) {
> 





More information about the pve-devel mailing list