[pve-devel] [RFC cluster 1/2] pvecm: updatecerts: allow specifying time to wait for quorum via CLI argument

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Jun 29 16:26:54 CEST 2023


Am 29/06/2023 um 15:59 schrieb Fiona Ebner:
> Useful for the updatecerts call triggered via the ExecStartPre hook
> for pveproxy.service.
> 
> When starting a node that's part of a cluster, there is a time window
> between the start of pve-cluster.service and when quorum is reached
> (from the node's perspective). pveproxy.service is ordered after
> pve-cluster.service, but that does not prevent the ExecStartPre hook
> from being executed before the node is part of the quorate partition.
> The pvecm updatecerts command won't do anything without quorum.
> 
> In particular, it might happen that the base directories for observed
> files will not get created during/after the upgrade from Proxmox VE 7
> to 8 (reported in the community forum [0] and reproduced right away in
> a virtual test cluster).
> 
> This parameter will allow to increase the chances for successful
> execution of the hook.
> 
> [0]: https://forum.proxmox.com/threads/129644/
> 
> Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
> ---
>  src/PVE/CLI/pvecm.pm | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 


Hmm, I would just do something like (untested and needs importing Time::HiRes):


@@ -576,6 +578,11 @@ __PACKAGE__->register_method ({
        # IO (on /etc/pve) which can hang (uninterruptedly D state). That'd be
        # no-good for ExecStartPre as it fails the whole service in this case
        PVE::Tools::run_fork_with_timeout(30, sub {
+           for (my $i = 0; !PVE::Cluster::check_cfs_quorum(1); $i++) {
+               print "waiting for pmxcfs mount to appear and get quorate...\n" if $i % 50 == 0;
+               usleep(100 * 1000);
+               $i++;
+           }
            PVE::Cluster::Setup::updatecerts_and_ssh($param->@{qw(force silent)});
            PVE::Cluster::prepare_observed_file_basedirs();
        });


after all any user or tooling calling this want's it to happen, so waiting until
the timeout seems sensible enough as hard coded default to me..





More information about the pve-devel mailing list