[pve-devel] [RFC cluster 1/2] pvecm: updatecerts: allow specifying time to wait for quorum via CLI argument

Fiona Ebner f.ebner at proxmox.com
Thu Jun 29 15:59:33 CEST 2023


Useful for the updatecerts call triggered via the ExecStartPre hook
for pveproxy.service.

When starting a node that's part of a cluster, there is a time window
between the start of pve-cluster.service and when quorum is reached
(from the node's perspective). pveproxy.service is ordered after
pve-cluster.service, but that does not prevent the ExecStartPre hook
from being executed before the node is part of the quorate partition.
The pvecm updatecerts command won't do anything without quorum.

In particular, it might happen that the base directories for observed
files will not get created during/after the upgrade from Proxmox VE 7
to 8 (reported in the community forum [0] and reproduced right away in
a virtual test cluster).

This parameter will allow to increase the chances for successful
execution of the hook.

[0]: https://forum.proxmox.com/threads/129644/

Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
---
 src/PVE/CLI/pvecm.pm | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/PVE/CLI/pvecm.pm b/src/PVE/CLI/pvecm.pm
index 564dc99..94f1e83 100755
--- a/src/PVE/CLI/pvecm.pm
+++ b/src/PVE/CLI/pvecm.pm
@@ -6,7 +6,7 @@ use warnings;
 use Cwd qw(getcwd);
 use File::Path;
 use File::Basename;
-use PVE::Tools qw(run_command);
+use PVE::Tools qw(extract_param run_command);
 use PVE::Cluster;
 use PVE::INotify;
 use PVE::JSONSchema qw(get_standard_option);
@@ -566,12 +566,33 @@ __PACKAGE__->register_method ({
 		type => 'boolean',
 		optional => 1,
 	    },
+	    'quorum-wait-seconds' => {
+		description => "Wait for quorum for this many seconds.",
+		type => 'integer',
+		minimum => 0,
+		optional => 1,
+	    },
 	},
     },
     returns => { type => 'null' },
     code => sub {
 	my ($param) = @_;
 
+	my $quorum_wait = extract_param($param, 'quorum-wait-seconds');
+
+	if ($quorum_wait && !PVE::Cluster::check_cfs_quorum(1)) {
+	    print "waiting for quorum...";
+	    STDOUT->flush();
+	    for (my $i = 0; $i < $quorum_wait; $i++) {
+		if (PVE::Cluster::check_cfs_quorum(1)) {
+		    print "OK";
+		    last;
+		}
+		sleep(1);
+	    }
+	    print "\n";
+	}
+
 	# we get called by the pveproxy.service ExecStartPre and as we do
 	# IO (on /etc/pve) which can hang (uninterruptedly D state). That'd be
 	# no-good for ExecStartPre as it fails the whole service in this case
-- 
2.39.2






More information about the pve-devel mailing list