[pve-devel] [PATCH v4 pve-manager 15/16] fix #4759: ceph: configure ceph-crash.service and its key

Max Carrara m.carrara at proxmox.com
Tue Mar 19 18:41:23 CET 2024


On Tue Mar 19, 2024 at 11:04 AM CET, Fabian Grünbichler wrote:
> On March 5, 2024 4:07 pm, Max Carrara wrote:
> > Due to Ceph dropping privileges when running the 'ceph-crash' daemon
> > [0], it is necessary to allow the daemon to authenticate with its
> > cluster in a safe manner.
> > 
> > In order to avoid exposing sensitive keyrings or somehow escalating
> > its privileges again, 'ceph-crash' is therefore provided with its own
> > keyring in the '/etc/pve/ceph' directory. This directory, due to being
> > on 'pmxcfs', may be read by members of the 'www-data' group, which
> > 'ceph-crash' is made part of [1].
> > 
> > 
> > Expected Configuration
> > ----------------------
> > 
> >  1. A keyring file named '/etc/pve/ceph/ceph.client.crash.keyring'
> >     exists
> >  2. A section named 'client.crash' exists in '/etc/pve/ceph.conf'
> >  3. The 'client.crash' section has a key named 'keyring' which
> >     references the keyring file as '/etc/pve/ceph/$cluster.$name.keyring'
> >  4. The 'client.crash' section has *no* key named 'key'
> > 
> > 
> > New Clusters
> > ------------
> > 
> > The keyring file is created and the conf file is updated after the first
> > monitor has been created (when calling `pveceph mon create`).
> > 
> > 
> > Existing Clusters
> > -----------------
> > 
> > A new helper script creates and configures the 'client.crash' keyring in
> > `postinst`, if:
> >  * Ceph is installed
> >  * Ceph is initialized ('/etc/pve/ceph.conf' and '/etc/pve/ceph' exist)
> >  * Connection to RADOS is successful
> > 
> > If the above conditions are met, the helper script ensures that the
> > existing configuration matches the expected configuration mentioned
> > above.
> > 
> > The configuration is not changed if it is already as expected.
> > 
> > The helper script may be called again manually if the `postinst` hook
> > fails. It is installed to '/usr/share/pve-manager/helpers/pve-init-ceph-crash'.
> > 
> > 
> > Existing `client.crash` Key
> > ---------------------------
> > 
> > If a key named 'client.crash' already exists within the cluster, it is
> > reused and not regenerated. 
> > 
> > 
> > [0]: https://github.com/ceph/ceph/pull/48713
> > [1]: https://git.proxmox.com/?p=ceph.git;a=commitdiff;h=f72c698a55905d93e9a0b7b95674616547deba8a
> > 
> > Signed-off-by: Max Carrara <m.carrara at proxmox.com>
> > ---
> > Changes v1 --> v2:
> >   * fix 'keyring' key being appended to 'client.crash' section even
> >     if it already exists and configured correctly
> > Changes v2 --> v3:
> >   * avoid needless second 'pmxcfs' lock on 'ceph.conf' when configuring
> >     'client.crash' keyring after first monitor was created
> >   * change name of helper sub that creates the ceph.crash keyring
> >   * the helper sub now implicitly expects the /etc/pve/ceph directory to
> >     exist (as the previous commit enforces its existence)
> >   * remove BASH part of postinst hook related to `ceph-crash`
> >   * add Perl helper script and call that instead in postinst hook
> >   * reword commit message
> > Changes v3 --> v4:
> >   * ensure key named 'key' does not exist for 'client.crash' section
> >     in helper script (thanks Friedrich!)
> >   * restart ceph-crash.service if it wasn't disabled (thanks Friedrich!)
> >   * add empty line before and after making changes in `update_ceph_conf`
> >     in 'postinst' hook
> >   * clean up the helper script; logic is easier to follow now
> >   * increase verbosity of output of helper script, showing what's
> >     exactly done when invoked
> > 
> >  PVE/API2/Ceph/MON.pm    |   8 +++
> >  PVE/Ceph/Tools.pm       |  35 +++++++++++
> >  bin/Makefile            |   1 +
> >  bin/pve-init-ceph-crash | 129 ++++++++++++++++++++++++++++++++++++++++
> >  debian/postinst         |  14 +++++
> >  5 files changed, 187 insertions(+)
> >  create mode 100755 bin/pve-init-ceph-crash
> > 
> > diff --git a/PVE/API2/Ceph/MON.pm b/PVE/API2/Ceph/MON.pm
> > index 1e959ef3..8b7ce376 100644
> > --- a/PVE/API2/Ceph/MON.pm
> > +++ b/PVE/API2/Ceph/MON.pm
> > @@ -450,6 +450,14 @@ __PACKAGE__->register_method ({
> >  			)
> >  		    };
> >  		    warn "$@" if $@;
> > +
> > +		    print "Configuring keyring for ceph-crash.service\n";
> > +		    eval {
> > +			PVE::Ceph::Tools::create_or_update_crash_keyring_file();
> > +			$cfg->{'client.crash'}->{keyring} = '/etc/pve/ceph/$cluster.$name.keyring';
> > +			cfs_write_file('ceph.conf', $cfg);
> > +		    };
> > +		    warn "Unable to configure keyring for ceph-crash.service: $@" if $@;
> >  		}
> >  
> >  		eval { PVE::Ceph::Services::ceph_service_cmd('enable', $monsection) };
> > diff --git a/PVE/Ceph/Tools.pm b/PVE/Ceph/Tools.pm
> > index 735bb116..9b97171e 100644
> > --- a/PVE/Ceph/Tools.pm
> > +++ b/PVE/Ceph/Tools.pm
> > @@ -20,6 +20,7 @@ my $pve_ceph_cfgpath = "/etc/pve/$ccname.conf";
> >  my $ceph_cfgpath = "$ceph_cfgdir/$ccname.conf";
> >  my $pve_ceph_cfgdir = "/etc/pve/ceph";
> >  
> > +my $pve_ceph_crash_key_path = "$pve_ceph_cfgdir/$ccname.client.crash.keyring";
> >  my $pve_mon_key_path = "/etc/pve/priv/$ccname.mon.keyring";
> >  my $pve_ckeyring_path = "/etc/pve/priv/$ccname.client.admin.keyring";
> >  my $ckeyring_path = "/etc/ceph/ceph.client.admin.keyring";
> > @@ -45,6 +46,7 @@ my $config_values = {
> >  
> >  my $config_files = {
> >      pve_ceph_cfgpath => $pve_ceph_cfgpath,
> > +    pve_ceph_crash_key_path => $pve_ceph_crash_key_path,
> >      pve_mon_key_path => $pve_mon_key_path,
> >      pve_ckeyring_path => $pve_ckeyring_path,
> >      ceph_bootstrap_osd_keyring => $ceph_bootstrap_osd_keyring,
> > @@ -420,6 +422,39 @@ sub get_or_create_admin_keyring {
> >      return $pve_ckeyring_path;
> >  }
> >  
> > +# is also used in `pve-init-ceph-crash` helper
> > +sub create_or_update_crash_keyring_file {
> > +    my ($rados) = @_;
> > +
> > +    if (!defined($rados)) {
> > +	$rados = PVE::RADOS->new();
> > +    }
> > +
> > +    my $output = $rados->mon_command({
> > +	prefix => 'auth get-or-create',
> > +	entity => 'client.crash',
> > +	caps => [
> > +	    mon => 'profile crash',
> > +	    mgr => 'profile crash',
> > +	],
> > +	format => 'plain',
> > +    });
> > +
> > +    if (-f $pve_ceph_crash_key_path) {
> > +	my $contents = PVE::Tools::file_get_contents($pve_ceph_crash_key_path);
> > +
> > +	if ($contents ne $output) {
> > +	    PVE::Tools::file_set_contents($pve_ceph_crash_key_path, $output);
> > +	    return 1;
> > +	}
> > +    } else {
> > +	PVE::Tools::file_set_contents($pve_ceph_crash_key_path, $output);
> > +	return 1;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> >  # get ceph-volume managed osds
> >  sub ceph_volume_list {
> >      my $result = {};
> > diff --git a/bin/Makefile b/bin/Makefile
> > index 06d8148e..b221e4b1 100644
> > --- a/bin/Makefile
> > +++ b/bin/Makefile
> > @@ -83,6 +83,7 @@ install: $(SCRIPTS) $(CLI_MANS) $(SERVICE_MANS) $(BASH_COMPLETIONS) $(ZSH_COMPLE
> >  	install -m 0755 $(SCRIPTS) $(BINDIR)
> >  	install -d $(USRSHARE)/helpers
> >  	install -m 0755 pve-startall-delay $(USRSHARE)/helpers
> > +	install -m 0755 pve-init-ceph-crash $(USRSHARE)/helpers
> >  	install -d $(MAN1DIR)
> >  	install -m 0644 $(CLI_MANS) $(MAN1DIR)
> >  	install -d $(MAN8DIR)
> > diff --git a/bin/pve-init-ceph-crash b/bin/pve-init-ceph-crash
> > new file mode 100755
> > index 00000000..c8e9217d
> > --- /dev/null
> > +++ b/bin/pve-init-ceph-crash
> > @@ -0,0 +1,129 @@
> > +#!/usr/bin/perl
> > +
> > +use strict;
> > +use warnings;
> > +
> > +use List::Util qw(first);
> > +
> > +use PVE::Ceph::Tools;
> > +use PVE::Cluster;
> > +use PVE::RADOS;
> > +use PVE::RPCEnvironment;
> > +
> > +my $ceph_cfg_file = 'ceph.conf';
> > +my $keyring_value = '/etc/pve/ceph/$cluster.$name.keyring';
> > +
> > +my $entity = 'client.crash';
>
> nit: this could be inlined?

Inlined as in use 'client.crash' as the key directly? Eh, I'm more a fan
of using a variable here, as constantly spelling it out while changing a
bunch of things gets a little painful ...

If you meant that I should put it in `try_adapt_cfg` - sure, I missed
that that's the only `sub` in which it's being used, woops!

>
> > +
> > +
> > +sub try_adapt_cfg {
> > +    my ($cfg) = @_;
> > +
> > +    my $changed = 0;
> > +    print("Checking whether the configuration for '$entity' needs to be updated.\n");
> > +
> > +    my $add_keyring = sub {
> > +	print("Setting keyring path to '$keyring_value'.\n");
> > +	$cfg->{$entity}->{keyring} = $keyring_value;
> > +	$changed = 1;
>
> this can be removed, you always return right after calling this sub..
>
> > +    };
> > +
> > +    if (!exists($cfg->{$entity})) {
> > +	print("Adding missing section for '$entity'.\n");
> > +	$add_keyring->();
> > +	return $changed;
>
> and all of these can be changed to uncondtionally return 1

Fair!

>
> > +    }
> > +
> > +    if (exists($cfg->{$entity}->{key})) {
> > +	print("Removing existing usage of key.\n");
> > +	delete($cfg->{$entity}->{key});
> > +	$changed = 1;
> > +    }
> > +
> > +    if (!exists($cfg->{$entity}->{keyring})) {
> > +	print("Keyring path is missing from configuration.\n");
> > +	$add_keyring->();
> > +	return $changed;
> > +    }
> > +
> > +    my $current_keyring_value = $cfg->{$entity}->{keyring};
> > +    if ($current_keyring_value ne $keyring_value) {
> > +	print("Current keyring path differs from expected path.\n");
> > +	$add_keyring->();
> > +	return $changed;
> > +    }
> > +
> > +    return $changed;
>
> $changed only serves the purpose of returning 1 iff only the 'key'
> value/entry was removed. so it could be renamed to reflect that ;)

Good point.

>
> > +}
> > +
> > +
> > +sub main {
> > +    my $rpcenv = PVE::RPCEnvironment->init('cli');
> > +
> > +    $rpcenv->init_request();
> > +    $rpcenv->set_language($ENV{LANG});
> > +    $rpcenv->set_user('root at pam');
>
> why do we need an rpcenv here?

Double-checked, just to be sure: `librados-perl` requires an
`RPCEnvironment` to do some handling regarding forks -
`RPCEnvironment::get()` will die if no env was initialized.

>
> > +
> > +    die "Not running as root. Exiting.\n" if $> != 0;
> > +
> > +    if (!PVE::Ceph::Tools::check_ceph_installed('ceph_bin', 1)) {
> > +        print("Ceph is not installed. No action required.\n");
> > +        exit 0;
> > +    }
> > +
> > +    eval {
> > +        PVE::Ceph::Tools::check_ceph_inited();
> > +    };
> > +    if ($@) {
> > +        print("Ceph is not initialized. No action required.\n");
> > +        exit 0;
> > +    }
> > +
> > +    my $rados = eval { PVE::RADOS->new() };
> > +    $@ = ''; # warnings are displayed regardless
>
> what's this line for?

Now that you mention it, I think this was a remnant from the previous
series, where the error handling was different. Will most likely remove
this in v5.

>
> > +    my $ceph_crash_key_path = PVE::Ceph::Tools::get_config('pve_ceph_crash_key_path');
> > +
> > +    PVE::Cluster::cfs_lock_file($ceph_cfg_file, undef, sub {
>
> if you let this sub return a boolean value (worked/failed)
>
> > +        my $cfg = PVE::Cluster::cfs_read_file($ceph_cfg_file);
> > +
> > +	if (!defined($rados)) {
> > +	    my $has_mon_config = defined(first { m/mon\..*/ } keys $cfg->%*);
> > +	    if ($has_mon_config) {
> > +		die "Connection to RADOS failed even though a monitor is configured.\n" .
> > +		    "Please verify whether your configuration is correct.\n"
> > +	    }
> > +
> > +	    print(
> > +		"Connection to RADOS failed and no monitor is configured. ".
> > +		"Assume that things are fine. No action required.\n"
> > +	    );
>
> not sure if it wouldn't be better to check mon_host here? that's the
> thing actually needed to contact the mon, everything else is optional..
> and even that could be moved to DNS, but we don't support that for
> PVE-managed clusters..

Hmm... I'll see what I can do / how it behaves.

>
> > +	    return;
> > +	}
> > +
> > +	my $updated_keyring = PVE::Ceph::Tools::create_or_update_crash_keyring_file($rados);
> > +
> > +	if ($updated_keyring) {
> > +	    print("Keyring file '$ceph_crash_key_path' was updated.\n");
> > +	}
> > +
> > +	my $changed = try_adapt_cfg($cfg);
> > +
> > +	if ($changed) {
> > +	    print("Committing updated configuration.\n");
> > +	    PVE::Cluster::cfs_write_file($ceph_cfg_file, $cfg);
> > +	    print("Successfully updated configuration for 'ceph-crash.service'.\n");
> > +	} else {
> > +	    print("Configuration does not need to be updated.\n");
> > +	}
> > +    });
>
> then you can just check the return value (undef means cfs_lock set an
> error, true means it worked, false means it failed without dieing, or
> whatever you want to return)

Thanks for the suggestion, will incorporate this in v5!

>
> > +    if ($@) {
> > +	warn("Failed to configure keyring for 'ceph-crash.service'. Error:\n");
>
> nit: I think I'd prefer the "Error: to be on the next line with the error message.."

ACK ;)

>
> > +	warn("$@");
> > +	exit 1;
> > +    }
> > +}
> > +
> > +main();
> > +
> > +0;
>
> why?

Not sure anymore... my bad! I think this was part of some module testing
that I was doing (where it's a convention to put a `1;` at the end of
the file to confirm that the module loaded successfully). Regardless,
this shouldn't be here, so will remove it in v5.

>
> > diff --git a/debian/postinst b/debian/postinst
> > index 61b50f97..c36afa13 100755
> > --- a/debian/postinst
> > +++ b/debian/postinst
> > @@ -82,10 +82,24 @@ EOF
> >  
> >  update_ceph_conf() {
> >      CEPH_CONF_DIR='/etc/pve/ceph'
> > +    UNIT='ceph-crash.service'
> > +
> > +    echo ""
> >  
> >      if test ! -d "${CEPH_CONF_DIR}"; then
> >          mkdir -p "${CEPH_CONF_DIR}"
> >      fi
> > +
> > +    # Don't fail in case user has "exotic" configuration where RADOS
> > +    # isn't available on all nodes for some reason
> > +    /usr/share/pve-manager/helpers/pve-init-ceph-crash || true
> > +
> > +    if systemctl -q is-enabled "$UNIT"; then
>
> like Friedrich said, please silence this (`2>/dev/null`)

ACK!

>
> > +        echo "Restarting '$UNIT'"
> > +        deb-systemd-invoke restart "$UNIT"
>
> || true please!

ACK! .. and thanks for the thorough review!

>
> > +    fi
> > +
> > +    echo ""
> >  }
> >  
> >  migrate_apt_auth_conf() {
> > -- 
> > 2.39.2
> > 
> > 
> > 
> > _______________________________________________
> > pve-devel mailing list
> > pve-devel at lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> > 
> > 
> > 
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel





More information about the pve-devel mailing list