[pve-devel] [PATCH v2 cluster 1/5] corosync: add verify_conf
Fabian Grünbichler
f.gruenbichler at proxmox.com
Mon Dec 16 18:09:45 CET 2019
On December 16, 2019 3:58 pm, Stefan Reiter wrote:
> On 12/12/19 4:21 PM, Thomas Lamprecht wrote:
>> On 12/4/19 3:04 PM, Stefan Reiter wrote:
>>> It does some basic sanity checking, warns the user about encryption
>>> settings and unresolved hostnames, and finally makes sure that all nodes
>>> have the same links configured (as well as comparing the configured
>>> links to specified interfaces, if there are any).
>>>
>>> A corosync.conf that has been created and modified strictly through our
>>> API should *always* be valid.
>>
>> True, but you also need to stay backward compatible with what our API
>> produced since 4.0 and what was allowed to be upgraded to 6.0.
>> Did you check if the possibilities then, with the pve-upgrade checker
>> were restrictive enough as you're here?
>>
>
> That's the reason for the
>
> if (%$interfaces) {
> ...
> } else {
> ...
> }
>
> block. New versions always have a totem->interface block, but old
> versions omitted it. Old configs are checked differently, but will still
> be valid (as long as the links are consistent between nodes).
>
>> Also, I'd still like to avoid error-ing if the configuration could worked,
>> but was a bit garbled up.. This is then still seen as regression for a
>> lot, and pitch forks may be lit and raised, so I'd like to try hard to
>> not break those setups, even if we can argue that it's not directly
>> from our API. If it's at least somewhat reasonable it should not be
>> an error. (did not checked all checks to closely, so you may already be
>> flexible enough, so this is more asking if you checked :) )
>>
>
> verify_conf only "errors" under the following conditions:
>
> * no nodes found (corosync won't run, obviously)
> * no totem found (corosync won't run, and currently our tools don't
> really handle this well either)
> * links are mismatched between nodes or nodes and totem
>
> The last one is the case that initially inspired the function.
> Technically, corosync does run with mismatched links (e.g. [0]), but
> throws some nasty looking errors into syslog. However, the code in patch
> 3/5 (verification and fallback behaviour) falls apart a bit with such
> configs.
>
> The idea is to tell the user that his config is wrong, instead of
> automagically doing the wrong thing™. We only call this on cluster join,
> so existing clusters are unaffected anyway (although I though about
> maybe calling it in 'pvecm status', but we wouldn't need to die there,
> just show the user).
>
> [0]: "mismatched" corosync.conf (shortened):
> nodelist {
> node {
> name: prod1
> nodeid: 1
> ring0_addr: 192.168.0.1
> }
> node {
> name: prod2
> nodeid: 2
> ring0_addr: 192.168.0.2
> ring1_addr: 192.168.1.2
> }
> }
>
> totem {
> interface {
> linknumber: 0
> linknumber: 4
> }
> ...
> }
>
> TL;DR: Yes, I checked ;) But above should give you the full picture, if
> there are any concerns/things I missed.
>
>>> + push @warnings, "warning: authentication/encryption is not explicitly enabled"
>>> + . " (secauth / crypto_cipher / crypto_hash)"
>>
>> you do not check the crypto_hash?
>>
>
> Hm, true, I just reused the code from pve5to6.pm:408. But I should have
> noticed that nonetheless...
>
> I'll fix it here and in the 5to6 tool, or was there a reason to do it
> this way there?
IIRC, crypto_cipher depends on crypto_hash being set, and we want
encryption and not just authentication ;)
>
>>> + if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on') &&
>>> + (!defined($totem->{crypto_cipher}) || $totem->{crypto_cipher} eq 'none');
>>
>> This style above is NAK'd ^^
>> Post-if only if the statement it guards is on a single, not long, line.
>>
>> Either full-blown `if () { .. }` or split it up and do:
>>
>> push @warnings, "warning: authentication (secauth) is not explicitly enabled"
>> if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on');
>>
>> push @warnings, "warning: encryption (crypto_cipher) is not explicitly enabled"
>> if (!defined($totem->{secauth}) || $totem->{secauth} ne 'on');
>>
>
> Ok
>
>>
>>> +
>>> + my $interfaces = $totem->{interface};
>>> +
>>> + my $verify_link_ip = sub {
>>> + my ($key, $link, $node) = @_;
>>> + my ($resolved_ip, undef) = resolve_hostname_like_corosync($link, $conf);
>>> + if (defined($resolved_ip)) {
>>> + push @warnings, "warning: $key '$link' for node '$node' resolves to"
>>> + . " '$resolved_ip' - consider replacing it with the currently"
>>> + . " resolved IP address for stability"
>>> + if $resolved_ip ne $link;
>>
>> same here, better to just use full-blown if (...) {…}
>>
>>> + } else {
>>> + push @warnings, "warning: unable to resolve $key '$link' for node '$node'"
>>> + . " to an IP address according to Corosync's resolve strategy -"
>>> + . " cluster could fail on restart!";
>>> + }
>>> + };
>>> +
>>> + # sort for output order stability
>>> + my @node_names = sort keys %$nodelist;
>>> +
>>> + my $node_links = {};
>>> + foreach my $node (@node_names) {
>>> + my $options = $nodelist->{$node};
>>> + foreach my $opt (keys %$options) {
>>> + next if $opt !~ $link_addr_re;
>>
>> maybe a ("private") helper method which return both, type and id, would be nicer,
>> else one asks what $1, or $2 is:
>>
>> my ($linktype, $linkid) = $parse_link_entry->($opt) or next;
>>
>
> I think I use something similar in a later patch, so might even make
> sense to do it as a public helper. Need to check, but I'll change this
> part either way.
>
>> (that with the next is guessed, needs to ensure it has no bad side-effects
>> in combination with the my ())
>>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
More information about the pve-devel
mailing list