[pve-devel] [PATCH cluster 3/3] Revert "pvecm: lock corosync config on addition and deletion"
Thomas Lamprecht
t.lamprecht at proxmox.com
Mon Nov 6 08:42:49 CET 2017
On 11/02/2017 02:22 PM, Dominik Csapak wrote:
> This reverts commit b778c013a2fc245008512365ac0b1a37287b30b3.
> but keeps some minor cleanups
>
> locking the corosync conf for node addition/deletion was in general a
> good idea, but this does not work correctly
IMO it is (was) not only a good idea but *must* be done, fighting
possible race condition with their chance is just not an option here,
even if the chance is low.
And it's not really true that it does not works correctly, the locking
itself worked really good, there where "just" a few problems when
unlocking in specific scenarios ;)
> after creating a cluster, when adding the second node
> we get a cluster lock for corosync.conf for the change in the file
> but want to release before the second node participates in the cluster,
> thus we have no quorum and cannot remove the lock, which
> prevents that we add a third node until the timeout of the lock runs out
> (currently 120s)
>
> this happens everytime when the quorate partition has (n+1)/2 nodes
> exactly (where n is the current (odd) number of nodes; e.g., when
> having a 3 node cluster with a quorate partition of 2) and adding a new
> node
Your patch does not fixes the root cause of this symptom, with a
non-ideal cluster state (i.e., nodes offline) where the quorate
partition has just enough members to be quorate, i.e., loosing one
member or increasing/decreasing the total member count by one would
let loose quorum, you will always run into problems when adding or
deleting (a member of the quorate partition) a node here, the
lock-failure is "just" a symptom of this all, as is no-quorum on a
botched addition.
So I'd rather do something like:
* get the current (quorate) partition member count at the start
* if a cluster with > 1 node, assert that we cannot provoke a quorum
loss just by doing the requested add/delete op
- if the assertion holds continue with the op in a locked context
- else, if the assertion does not hold only continue when force
is passed as a parameter. Warn accordingly.
* else, if we are a single node cluster, we may do the locking
locally with an flock, as there is no cluster member, which could
meddle with the config - besides, naturally, someone/something on
our the local host, which the flock is enough protection against.
With this we ensure that an user does not navigates himself into a,
possibly, hard to resolve situation in a bit more general way.
Does this sound good for you? I would mind looking into it,
if you don't have time doing it. :)
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
> this patch is best viewed with the '-w' parameter
> data/PVE/CLI/pvecm.pm | 180 ++++++++++++++++++++++++--------------------------
> 1 file changed, 85 insertions(+), 95 deletions(-)
>
> diff --git a/data/PVE/CLI/pvecm.pm b/data/PVE/CLI/pvecm.pm
> index 9707114..3c64461 100755
> --- a/data/PVE/CLI/pvecm.pm
> +++ b/data/PVE/CLI/pvecm.pm
> @@ -309,90 +309,85 @@ __PACKAGE__->register_method ({
> PVE::Cluster::check_cfs_quorum();
> - my $code = sub {
> - my $conf = PVE::Cluster::cfs_read_file("corosync.conf");
> - my $nodelist = PVE::Corosync::nodelist($conf);
> - my $totem_cfg = PVE::Corosync::totem_config($conf);
> -
> - my $name = $param->{node};
> -
> - # ensure we do not reuse an address, that can crash the whole cluster!
> - my $check_duplicate_addr = sub {
> - my $addr = shift;
> - return if !defined($addr);
> -
> - while (my ($k, $v) = each %$nodelist) {
> - next if $k eq $name; # allows re-adding a node if force is set
> - if ($v->{ring0_addr} eq $addr || ($v->{ring1_addr} && $v->{ring1_addr} eq $addr)) {
> - die "corosync: address '$addr' already defined by node '$k'\n";
> - }
> + my $conf = PVE::Cluster::cfs_read_file("corosync.conf");
> + my $nodelist = PVE::Corosync::nodelist($conf);
> + my $totem_cfg = PVE::Corosync::totem_config($conf);
> +
> + my $name = $param->{node};
> +
> + # ensure we do not reuse an address, that can crash the whole cluster!
> + my $check_duplicate_addr = sub {
> + my $addr = shift;
> + return if !defined($addr);
> +
> + while (my ($k, $v) = each %$nodelist) {
> + next if $k eq $name; # allows re-adding a node if force is set
> + if ($v->{ring0_addr} eq $addr || ($v->{ring1_addr} && $v->{ring1_addr} eq $addr)) {
> + die "corosync: address '$addr' already defined by node '$k'\n";
> }
> - };
> + }
> + };
> - &$check_duplicate_addr($param->{ring0_addr});
> - &$check_duplicate_addr($param->{ring1_addr});
> + &$check_duplicate_addr($param->{ring0_addr});
> + &$check_duplicate_addr($param->{ring1_addr});
> - $param->{ring0_addr} = $name if !$param->{ring0_addr};
> + $param->{ring0_addr} = $name if !$param->{ring0_addr};
> - die "corosync: using 'ring1_addr' parameter needs a configured ring 1 interface!\n"
> - if $param->{ring1_addr} && !defined($totem_cfg->{interface}->{1});
> + die "corosync: using 'ring1_addr' parameter needs a configured ring 1 interface!\n"
> + if $param->{ring1_addr} && !defined($totem_cfg->{interface}->{1});
> - die "corosync: ring 1 interface configured but 'ring1_addr' parameter not defined!\n"
> - if defined($totem_cfg->{interface}->{1}) && !defined($param->{ring1_addr});
> + die "corosync: ring 1 interface configured but 'ring1_addr' parameter not defined!\n"
> + if defined($totem_cfg->{interface}->{1}) && !defined($param->{ring1_addr});
> - if (defined(my $res = $nodelist->{$name})) {
> - $param->{nodeid} = $res->{nodeid} if !$param->{nodeid};
> - $param->{votes} = $res->{quorum_votes} if !defined($param->{votes});
> + if (defined(my $res = $nodelist->{$name})) {
> + $param->{nodeid} = $res->{nodeid} if !$param->{nodeid};
> + $param->{votes} = $res->{quorum_votes} if !defined($param->{votes});
> - if ($res->{quorum_votes} == $param->{votes} &&
> - $res->{nodeid} == $param->{nodeid}) {
> - print "node $name already defined\n";
> - if ($param->{force}) {
> - exit (0);
> - } else {
> - exit (-1);
> - }
> + if ($res->{quorum_votes} == $param->{votes} &&
> + $res->{nodeid} == $param->{nodeid}) {
> + print "node $name already defined\n";
> + if ($param->{force}) {
> + exit (0);
> } else {
> - die "can't add existing node\n";
> + exit (-1);
> }
> - } elsif (!$param->{nodeid}) {
> - my $nodeid = 1;
> -
> - while(1) {
> - my $found = 0;
> - foreach my $v (values %$nodelist) {
> - if ($v->{nodeid} eq $nodeid) {
> - $found = 1;
> - $nodeid++;
> - last;
> - }
> - }
> - last if !$found;
> - };
> -
> - $param->{nodeid} = $nodeid;
> + } else {
> + die "can't add existing node\n";
> }
> + } elsif (!$param->{nodeid}) {
> + my $nodeid = 1;
> +
> + while(1) {
> + my $found = 0;
> + foreach my $v (values %$nodelist) {
> + if ($v->{nodeid} eq $nodeid) {
> + $found = 1;
> + $nodeid++;
> + last;
> + }
> + }
> + last if !$found;
> + };
> - $param->{votes} = 1 if !defined($param->{votes});
> + $param->{nodeid} = $nodeid;
> + }
> - PVE::Cluster::gen_local_dirs($name);
> + $param->{votes} = 1 if !defined($param->{votes});
> - eval { PVE::Cluster::ssh_merge_keys(); };
> - warn $@ if $@;
> + PVE::Cluster::gen_local_dirs($name);
> - $nodelist->{$name} = {
> - ring0_addr => $param->{ring0_addr},
> - nodeid => $param->{nodeid},
> - name => $name,
> - };
> - $nodelist->{$name}->{ring1_addr} = $param->{ring1_addr} if $param->{ring1_addr};
> - $nodelist->{$name}->{quorum_votes} = $param->{votes} if $param->{votes};
> + eval { PVE::Cluster::ssh_merge_keys(); };
> + warn $@ if $@;
> - PVE::Corosync::update_nodelist($conf, $nodelist);
> + $nodelist->{$name} = {
> + ring0_addr => $param->{ring0_addr},
> + nodeid => $param->{nodeid},
> + name => $name,
> };
> + $nodelist->{$name}->{ring1_addr} = $param->{ring1_addr} if $param->{ring1_addr};
> + $nodelist->{$name}->{quorum_votes} = $param->{votes} if $param->{votes};
> - PVE::Cluster::cfs_lock_file('corosync.conf', 10, $code);
> - die $@ if $@;
> + PVE::Corosync::update_nodelist($conf, $nodelist);
> exit (0);
> }});
> @@ -419,41 +414,36 @@ __PACKAGE__->register_method ({
> PVE::Cluster::check_cfs_quorum();
> - my $code = sub {
> - my $conf = PVE::Cluster::cfs_read_file("corosync.conf");
> - my $nodelist = PVE::Corosync::nodelist($conf);
> -
> - my $node;
> - my $nodeid;
> -
> - foreach my $tmp_node (keys %$nodelist) {
> - my $d = $nodelist->{$tmp_node};
> - my $ring0_addr = $d->{ring0_addr};
> - my $ring1_addr = $d->{ring1_addr};
> - if (($tmp_node eq $param->{node}) ||
> - (defined($ring0_addr) && ($ring0_addr eq $param->{node})) ||
> - (defined($ring1_addr) && ($ring1_addr eq $param->{node}))) {
> - $node = $tmp_node;
> - $nodeid = $d->{nodeid};
> - last;
> - }
> + my $conf = PVE::Cluster::cfs_read_file("corosync.conf");
> + my $nodelist = PVE::Corosync::nodelist($conf);
> +
> + my $node;
> + my $nodeid;
> +
> + foreach my $tmp_node (keys %$nodelist) {
> + my $d = $nodelist->{$tmp_node};
> + my $ring0_addr = $d->{ring0_addr};
> + my $ring1_addr = $d->{ring1_addr};
> + if (($tmp_node eq $param->{node}) ||
> + (defined($ring0_addr) && ($ring0_addr eq $param->{node})) ||
> + (defined($ring1_addr) && ($ring1_addr eq $param->{node}))) {
> + $node = $tmp_node;
> + $nodeid = $d->{nodeid};
> + last;
> }
> + }
> - die "Node/IP: $param->{node} is not a known host of the cluster.\n"
> + die "Node/IP: $param->{node} is not a known host of the cluster.\n"
> if !defined($node);
> - my $our_nodename = PVE::INotify::nodename();
> - die "Cannot delete myself from cluster!\n" if $node eq $our_nodename;
> + my $our_nodename = PVE::INotify::nodename();
> + die "Cannot delete myself from cluster!\n" if $node eq $our_nodename;
> - delete $nodelist->{$node};
> + delete $nodelist->{$node};
> - PVE::Corosync::update_nodelist($conf, $nodelist);
> -
> - run_command(['corosync-cfgtool','-k', $nodeid]) if defined($nodeid);
> - };
> + PVE::Corosync::update_nodelist($conf, $nodelist);
> - PVE::Cluster::cfs_lock_file('corosync.conf', 10, $code);
> - die $@ if $@;
> + run_command(['corosync-cfgtool','-k', $nodeid]) if defined($nodeid);
> return undef;
> }});
More information about the pve-devel
mailing list