[pve-devel] [PATCH v2 cluster 1/5] corosync: add verify_conf
Stefan Reiter
s.reiter at proxmox.com
Mon Dec 16 15:58:17 CET 2019
On 12/12/19 4:21 PM, Thomas Lamprecht wrote:
> On 12/4/19 3:04 PM, Stefan Reiter wrote:
>> It does some basic sanity checking, warns the user about encryption
>> settings and unresolved hostnames, and finally makes sure that all nodes
>> have the same links configured (as well as comparing the configured
>> links to specified interfaces, if there are any).
>>
>> A corosync.conf that has been created and modified strictly through our
>> API should *always* be valid.
>
> True, but you also need to stay backward compatible with what our API
> produced since 4.0 and what was allowed to be upgraded to 6.0.
> Did you check if the possibilities then, with the pve-upgrade checker
> were restrictive enough as you're here?
>
That's the reason for the
if (%$interfaces) {
...
} else {
...
}
block. New versions always have a totem->interface block, but old
versions omitted it. Old configs are checked differently, but will still
be valid (as long as the links are consistent between nodes).
> Also, I'd still like to avoid error-ing if the configuration could worked,
> but was a bit garbled up.. This is then still seen as regression for a
> lot, and pitch forks may be lit and raised, so I'd like to try hard to
> not break those setups, even if we can argue that it's not directly
> from our API. If it's at least somewhat reasonable it should not be
> an error. (did not checked all checks to closely, so you may already be
> flexible enough, so this is more asking if you checked :) )
>
verify_conf only "errors" under the following conditions:
* no nodes found (corosync won't run, obviously)
* no totem found (corosync won't run, and currently our tools don't
really handle this well either)
* links are mismatched between nodes or nodes and totem
The last one is the case that initially inspired the function.
Technically, corosync does run with mismatched links (e.g. [0]), but
throws some nasty looking errors into syslog. However, the code in patch
3/5 (verification and fallback behaviour) falls apart a bit with such
configs.
The idea is to tell the user that his config is wrong, instead of
automagically doing the wrong thing™. We only call this on cluster join,
so existing clusters are unaffected anyway (although I though about
maybe calling it in 'pvecm status', but we wouldn't need to die there,
just show the user).
[0]: "mismatched" corosync.conf (shortened):
nodelist {
node {
name: prod1
nodeid: 1
ring0_addr: 192.168.0.1
}
node {
name: prod2
nodeid: 2
ring0_addr: 192.168.0.2
ring1_addr: 192.168.1.2
}
}
totem {
interface {
linknumber: 0
linknumber: 4
}
...
}
TL;DR: Yes, I checked ;) But above should give you the full picture, if
there are any concerns/things I missed.
>> + push @warnings, "warning: authentication/encryption is not explicitly enabled"
>> + . " (secauth / crypto_cipher / crypto_hash)"
>
> you do not check the crypto_hash?
>
Hm, true, I just reused the code from pve5to6.pm:408. But I should have
noticed that nonetheless...
I'll fix it here and in the 5to6 tool, or was there a reason to do it
this way there?
>> + if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on') &&
>> + (!defined($totem->{crypto_cipher}) || $totem->{crypto_cipher} eq 'none');
>
> This style above is NAK'd ^^
> Post-if only if the statement it guards is on a single, not long, line.
>
> Either full-blown `if () { .. }` or split it up and do:
>
> push @warnings, "warning: authentication (secauth) is not explicitly enabled"
> if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on');
>
> push @warnings, "warning: encryption (crypto_cipher) is not explicitly enabled"
> if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on');
>
Ok
>
>> +
>> + my $interfaces = $totem->{interface};
>> +
>> + my $verify_link_ip = sub {
>> + my ($key, $link, $node) = @_;
>> + my ($resolved_ip, undef) = resolve_hostname_like_corosync($link, $conf);
>> + if (defined($resolved_ip)) {
>> + push @warnings, "warning: $key '$link' for node '$node' resolves to"
>> + . " '$resolved_ip' - consider replacing it with the currently"
>> + . " resolved IP address for stability"
>> + if $resolved_ip ne $link;
>
> same here, better to just use full-blown if (...) {…}
>
>> + } else {
>> + push @warnings, "warning: unable to resolve $key '$link' for node '$node'"
>> + . " to an IP address according to Corosync's resolve strategy -"
>> + . " cluster could fail on restart!";
>> + }
>> + };
>> +
>> + # sort for output order stability
>> + my @node_names = sort keys %$nodelist;
>> +
>> + my $node_links = {};
>> + foreach my $node (@node_names) {
>> + my $options = $nodelist->{$node};
>> + foreach my $opt (keys %$options) {
>> + next if $opt !~ $link_addr_re;
>
> maybe a ("private") helper method which return both, type and id, would be nicer,
> else one asks what $1, or $2 is:
>
> my ($linktype, $linkid) = $parse_link_entry->($opt) or next;
>
I think I use something similar in a later patch, so might even make
sense to do it as a public helper. Need to check, but I'll change this
part either way.
> (that with the next is guessed, needs to ensure it has no bad side-effects
> in combination with the my ())
>
More information about the pve-devel
mailing list