[pve-devel] [PATCH v2 cluster 1/5] corosync: add verify_conf

Mon Dec 16 15:58:17 CET 2019

On 12/12/19 4:21 PM, Thomas Lamprecht wrote:
> On 12/4/19 3:04 PM, Stefan Reiter wrote:
>> It does some basic sanity checking, warns the user about encryption
>> settings and unresolved hostnames, and finally makes sure that all nodes
>> have the same links configured (as well as comparing the configured
>> links to specified interfaces, if there are any).
>>
>> A corosync.conf that has been created and modified strictly through our
>> API should *always* be valid.
> 
> True, but you also need to stay backward compatible with what our API
> produced since 4.0 and what was allowed to be upgraded to 6.0.
> Did you check if the possibilities then, with the pve-upgrade checker
> were restrictive enough as you're here?
> 

That's the reason for the

if (%$interfaces) {
     ...
} else {
     ...
}

block. New versions always have a totem->interface block, but old 
versions omitted it. Old configs are checked differently, but will still 
be valid (as long as the links are consistent between nodes).

> Also, I'd still like to avoid error-ing if the configuration could worked,
> but was a bit garbled up.. This is then still seen as regression for a
> lot, and pitch forks may be lit and raised, so I'd like to try hard to
> not break those setups, even if we can argue that it's not directly
> from our API. If it's at least somewhat reasonable it should not be
> an error. (did not checked all checks to closely, so you may already be
> flexible enough, so this is more asking if you checked :) )
> 

verify_conf only "errors" under the following conditions:

* no nodes found (corosync won't run, obviously)
* no totem found (corosync won't run, and currently our tools don't 
really handle this well either)
* links are mismatched between nodes or nodes and totem

The last one is the case that initially inspired the function. 
Technically, corosync does run with mismatched links (e.g. [0]), but 
throws some nasty looking errors into syslog. However, the code in patch 
3/5 (verification and fallback behaviour) falls apart a bit with such 
configs.

The idea is to tell the user that his config is wrong, instead of 
automagically doing the wrong thing™. We only call this on cluster join, 
so existing clusters are unaffected anyway (although I though about 
maybe calling it in 'pvecm status', but we wouldn't need to die there, 
just show the user).

[0]: "mismatched" corosync.conf (shortened):
nodelist {
   node {
     name: prod1
     nodeid: 1
     ring0_addr: 192.168.0.1
   }
   node {
     name: prod2
     nodeid: 2
     ring0_addr: 192.168.0.2
     ring1_addr: 192.168.1.2
   }
}

totem {
   interface {
     linknumber: 0
     linknumber: 4
   }
   ...
}

TL;DR: Yes, I checked ;) But above should give you the full picture, if 
there are any concerns/things I missed.

>> +    push @warnings, "warning: authentication/encryption is not explicitly enabled"
>> +	. " (secauth / crypto_cipher / crypto_hash)"
> 
> you do not check the crypto_hash?
> 

Hm, true, I just reused the code from pve5to6.pm:408. But I should have 
noticed that nonetheless...

I'll fix it here and in the 5to6 tool, or was there a reason to do it 
this way there?

>> +	if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on') &&
>> +	   (!defined($totem->{crypto_cipher}) || $totem->{crypto_cipher} eq 'none');
> 
> This style above is NAK'd ^^
> Post-if only if the statement it guards is on a single, not long, line.
> 
> Either full-blown `if () { .. }` or split it up and do:
> 
> push @warnings, "warning: authentication (secauth) is not explicitly enabled"
>      if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on');
> 
> push @warnings, "warning: encryption (crypto_cipher) is not explicitly enabled"
>      if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on');
> 

Ok

> 
>> +
>> +    my $interfaces = $totem->{interface};
>> +
>> +    my $verify_link_ip = sub {
>> +	my ($key, $link, $node) = @_;
>> +	my ($resolved_ip, undef) = resolve_hostname_like_corosync($link, $conf);
>> +	if (defined($resolved_ip)) {
>> +	    push @warnings, "warning: $key '$link' for node '$node' resolves to"
>> +		. " '$resolved_ip' - consider replacing it with the currently"
>> +		. " resolved IP address for stability"
>> +		if $resolved_ip ne $link;
> 
> same here, better to just use full-blown if (...) {…}
> 
>> +	} else {
>> +	    push @warnings, "warning: unable to resolve $key '$link' for node '$node'"
>> +	      . " to an IP address according to Corosync's resolve strategy -"
>> +	      . " cluster could fail on restart!";
>> +	}
>> +    };
>> +
>> +    # sort for output order stability
>> +    my @node_names = sort keys %$nodelist;
>> +
>> +    my $node_links = {};
>> +    foreach my $node (@node_names) {
>> +	my $options = $nodelist->{$node};
>> +	foreach my $opt (keys %$options) {
>> +	    next if $opt !~ $link_addr_re;
> 
> maybe a ("private") helper method which return both, type and id, would be nicer,
> else one asks what $1, or $2 is:
> 
> my ($linktype, $linkid) = $parse_link_entry->($opt) or next;
> 

I think I use something similar in a later patch, so might even make 
sense to do it as a public helper. Need to check, but I'll change this 
part either way.

> (that with the next is guessed, needs to ensure it has no bad side-effects
> in combination with the my ())
>