[pve-devel] [PATCH pve-manager 5/8] fix #4759: ceph: configure keyring for ceph-crash.service

Wed Jan 31 14:17:53 CET 2024

On January 30, 2024 7:40 pm, Max Carrara wrote:
> when creating the cluster's first monitor.
> 
> Signed-off-by: Max Carrara <m.carrara at proxmox.com>
> ---
>  PVE/API2/Ceph/MON.pm | 28 +++++++++++++++++++++++++++-
>  PVE/Ceph/Services.pm | 12 ++++++++++--
>  PVE/Ceph/Tools.pm    | 38 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 75 insertions(+), 3 deletions(-)
> 
> diff --git a/PVE/API2/Ceph/MON.pm b/PVE/API2/Ceph/MON.pm
> index 1e959ef3..8d75f5d1 100644
> --- a/PVE/API2/Ceph/MON.pm
> +++ b/PVE/API2/Ceph/MON.pm
> @@ -459,11 +459,37 @@ __PACKAGE__->register_method ({
>  	    });
>  	    die $@ if $@;
>  	    # automatically create manager after the first monitor is created
> +	    # and set up keyring and config for ceph-crash.service
>  	    if ($is_first_monitor) {
>  		PVE::API2::Ceph::MGR->createmgr({
>  		    node => $param->{node},
>  		    id => $param->{node}
> -		})
> +		});
> +
> +		PVE::Cluster::cfs_lock_file('ceph.conf', undef, sub {
> +		    my $cfg = cfs_read_file('ceph.conf');
> +
> +		    if ($cfg->{'client.crash'}) {
> +			return undef;
> +		    }
> +
> +		    $cfg->{'client.crash'}->{keyring} = '/etc/pve/ceph/$cluster.$name.keyring';
> +
> +		    cfs_write_file('ceph.conf', $cfg);
> +		});
> +		die $@ if $@;
> +
> +		eval {
> +		    PVE::Ceph::Tools::get_or_create_crash_keyring();
> +		};
> +		warn "Unable to configure keyring for ceph-crash.service: $@" if $@;

the order here should maybe be switched around? first handle the
keyring, then put it in the config?

> +
> +		print "enabling service 'ceph-crash.service'\n";
> +		PVE::Ceph::Services::ceph_service_cmd('enable', 'crash');

shouldn't this already be handled by default?

> +		print "starting service 'ceph-crash.service'\n";
> +		# ceph-crash already runs by default,
> +		# this makes sure the keyring is used
> +		PVE::Ceph::Services::ceph_service_cmd('restart', 'crash');

this should probably be a try-restart to avoid starting it if the admin
explicitly disabled and/or stopped it..

but - AFAICT, the ceph-crash script that is executed by the service
boils down to (as forked process!) "ceph -n XXX ..." where XXX is (in sequence)
client.crash.$HOST, client.crash, client.admin, so a service restart
shouldn't even be needed, since a fresh ceph (client) process will pick
up the config changes anyway?

>  	    }
>  	};
>  
> diff --git a/PVE/Ceph/Services.pm b/PVE/Ceph/Services.pm
> index e0f31e8e..5f5986f9 100644
> --- a/PVE/Ceph/Services.pm
> +++ b/PVE/Ceph/Services.pm

> [..]

> diff --git a/PVE/Ceph/Tools.pm b/PVE/Ceph/Tools.pm
> index 3acef11b..cf9f2ed4 100644
> --- a/PVE/Ceph/Tools.pm
> +++ b/PVE/Ceph/Tools.pm

> [..]

> +    my $output = $rados->mon_command({
> +	prefix => 'auth get-or-create',
> +	entity => 'client.crash',
> +	caps => [
> +	    mon => 'profile crash',
> +	    mgr => 'profile crash',
> +	],
> +	format => 'plain',
> +    });
> +
> +    if (! -d $pve_ceph_cfgdir) {
> +	mkdir $pve_ceph_cfgdir;
> +    }
> +
> +    PVE::Tools::file_set_contents($pve_ceph_crash_key_path, $output);
> +
> +    return $pve_ceph_crash_key_path;
> +}
> +

we have another helper for creating a keyring (and another inline call
to ceph-authtool when creating a monitor), should we unify them?