[pve-devel] [PATCH v4 pve-manager 15/16] fix #4759: ceph: configure ceph-crash.service and its key

Tue Mar 19 11:04:23 CET 2024

On March 5, 2024 4:07 pm, Max Carrara wrote:
> Due to Ceph dropping privileges when running the 'ceph-crash' daemon
> [0], it is necessary to allow the daemon to authenticate with its
> cluster in a safe manner.
> 
> In order to avoid exposing sensitive keyrings or somehow escalating
> its privileges again, 'ceph-crash' is therefore provided with its own
> keyring in the '/etc/pve/ceph' directory. This directory, due to being
> on 'pmxcfs', may be read by members of the 'www-data' group, which
> 'ceph-crash' is made part of [1].
> 
> 
> Expected Configuration
> ----------------------
> 
>  1. A keyring file named '/etc/pve/ceph/ceph.client.crash.keyring'
>     exists
>  2. A section named 'client.crash' exists in '/etc/pve/ceph.conf'
>  3. The 'client.crash' section has a key named 'keyring' which
>     references the keyring file as '/etc/pve/ceph/$cluster.$name.keyring'
>  4. The 'client.crash' section has *no* key named 'key'
> 
> 
> New Clusters
> ------------
> 
> The keyring file is created and the conf file is updated after the first
> monitor has been created (when calling `pveceph mon create`).
> 
> 
> Existing Clusters
> -----------------
> 
> A new helper script creates and configures the 'client.crash' keyring in
> `postinst`, if:
>  * Ceph is installed
>  * Ceph is initialized ('/etc/pve/ceph.conf' and '/etc/pve/ceph' exist)
>  * Connection to RADOS is successful
> 
> If the above conditions are met, the helper script ensures that the
> existing configuration matches the expected configuration mentioned
> above.
> 
> The configuration is not changed if it is already as expected.
> 
> The helper script may be called again manually if the `postinst` hook
> fails. It is installed to '/usr/share/pve-manager/helpers/pve-init-ceph-crash'.
> 
> 
> Existing `client.crash` Key
> ---------------------------
> 
> If a key named 'client.crash' already exists within the cluster, it is
> reused and not regenerated. 
> 
> 
> [0]: https://github.com/ceph/ceph/pull/48713
> [1]: https://git.proxmox.com/?p=ceph.git;a=commitdiff;h=f72c698a55905d93e9a0b7b95674616547deba8a
> 
> Signed-off-by: Max Carrara <m.carrara at proxmox.com>
> ---
> Changes v1 --> v2:
>   * fix 'keyring' key being appended to 'client.crash' section even
>     if it already exists and configured correctly
> Changes v2 --> v3:
>   * avoid needless second 'pmxcfs' lock on 'ceph.conf' when configuring
>     'client.crash' keyring after first monitor was created
>   * change name of helper sub that creates the ceph.crash keyring
>   * the helper sub now implicitly expects the /etc/pve/ceph directory to
>     exist (as the previous commit enforces its existence)
>   * remove BASH part of postinst hook related to `ceph-crash`
>   * add Perl helper script and call that instead in postinst hook
>   * reword commit message
> Changes v3 --> v4:
>   * ensure key named 'key' does not exist for 'client.crash' section
>     in helper script (thanks Friedrich!)
>   * restart ceph-crash.service if it wasn't disabled (thanks Friedrich!)
>   * add empty line before and after making changes in `update_ceph_conf`
>     in 'postinst' hook
>   * clean up the helper script; logic is easier to follow now
>   * increase verbosity of output of helper script, showing what's
>     exactly done when invoked
> 
>  PVE/API2/Ceph/MON.pm    |   8 +++
>  PVE/Ceph/Tools.pm       |  35 +++++++++++
>  bin/Makefile            |   1 +
>  bin/pve-init-ceph-crash | 129 ++++++++++++++++++++++++++++++++++++++++
>  debian/postinst         |  14 +++++
>  5 files changed, 187 insertions(+)
>  create mode 100755 bin/pve-init-ceph-crash
> 
> diff --git a/PVE/API2/Ceph/MON.pm b/PVE/API2/Ceph/MON.pm
> index 1e959ef3..8b7ce376 100644
> --- a/PVE/API2/Ceph/MON.pm
> +++ b/PVE/API2/Ceph/MON.pm
> @@ -450,6 +450,14 @@ __PACKAGE__->register_method ({
>  			)
>  		    };
>  		    warn "$@" if $@;
> +
> +		    print "Configuring keyring for ceph-crash.service\n";
> +		    eval {
> +			PVE::Ceph::Tools::create_or_update_crash_keyring_file();
> +			$cfg->{'client.crash'}->{keyring} = '/etc/pve/ceph/$cluster.$name.keyring';
> +			cfs_write_file('ceph.conf', $cfg);
> +		    };
> +		    warn "Unable to configure keyring for ceph-crash.service: $@" if $@;
>  		}
>  
>  		eval { PVE::Ceph::Services::ceph_service_cmd('enable', $monsection) };
> diff --git a/PVE/Ceph/Tools.pm b/PVE/Ceph/Tools.pm
> index 735bb116..9b97171e 100644
> --- a/PVE/Ceph/Tools.pm
> +++ b/PVE/Ceph/Tools.pm
> @@ -20,6 +20,7 @@ my $pve_ceph_cfgpath = "/etc/pve/$ccname.conf";
>  my $ceph_cfgpath = "$ceph_cfgdir/$ccname.conf";
>  my $pve_ceph_cfgdir = "/etc/pve/ceph";
>  
> +my $pve_ceph_crash_key_path = "$pve_ceph_cfgdir/$ccname.client.crash.keyring";
>  my $pve_mon_key_path = "/etc/pve/priv/$ccname.mon.keyring";
>  my $pve_ckeyring_path = "/etc/pve/priv/$ccname.client.admin.keyring";
>  my $ckeyring_path = "/etc/ceph/ceph.client.admin.keyring";
> @@ -45,6 +46,7 @@ my $config_values = {
>  
>  my $config_files = {
>      pve_ceph_cfgpath => $pve_ceph_cfgpath,
> +    pve_ceph_crash_key_path => $pve_ceph_crash_key_path,
>      pve_mon_key_path => $pve_mon_key_path,
>      pve_ckeyring_path => $pve_ckeyring_path,
>      ceph_bootstrap_osd_keyring => $ceph_bootstrap_osd_keyring,
> @@ -420,6 +422,39 @@ sub get_or_create_admin_keyring {
>      return $pve_ckeyring_path;
>  }
>  
> +# is also used in `pve-init-ceph-crash` helper
> +sub create_or_update_crash_keyring_file {
> +    my ($rados) = @_;
> +
> +    if (!defined($rados)) {
> +	$rados = PVE::RADOS->new();
> +    }
> +
> +    my $output = $rados->mon_command({
> +	prefix => 'auth get-or-create',
> +	entity => 'client.crash',
> +	caps => [
> +	    mon => 'profile crash',
> +	    mgr => 'profile crash',
> +	],
> +	format => 'plain',
> +    });
> +
> +    if (-f $pve_ceph_crash_key_path) {
> +	my $contents = PVE::Tools::file_get_contents($pve_ceph_crash_key_path);
> +
> +	if ($contents ne $output) {
> +	    PVE::Tools::file_set_contents($pve_ceph_crash_key_path, $output);
> +	    return 1;
> +	}
> +    } else {
> +	PVE::Tools::file_set_contents($pve_ceph_crash_key_path, $output);
> +	return 1;
> +    }
> +
> +    return 0;
> +}
> +
>  # get ceph-volume managed osds
>  sub ceph_volume_list {
>      my $result = {};
> diff --git a/bin/Makefile b/bin/Makefile
> index 06d8148e..b221e4b1 100644
> --- a/bin/Makefile
> +++ b/bin/Makefile
> @@ -83,6 +83,7 @@ install: $(SCRIPTS) $(CLI_MANS) $(SERVICE_MANS) $(BASH_COMPLETIONS) $(ZSH_COMPLE
>  	install -m 0755 $(SCRIPTS) $(BINDIR)
>  	install -d $(USRSHARE)/helpers
>  	install -m 0755 pve-startall-delay $(USRSHARE)/helpers
> +	install -m 0755 pve-init-ceph-crash $(USRSHARE)/helpers
>  	install -d $(MAN1DIR)
>  	install -m 0644 $(CLI_MANS) $(MAN1DIR)
>  	install -d $(MAN8DIR)
> diff --git a/bin/pve-init-ceph-crash b/bin/pve-init-ceph-crash
> new file mode 100755
> index 00000000..c8e9217d
> --- /dev/null
> +++ b/bin/pve-init-ceph-crash
> @@ -0,0 +1,129 @@
> +#!/usr/bin/perl
> +
> +use strict;
> +use warnings;
> +
> +use List::Util qw(first);
> +
> +use PVE::Ceph::Tools;
> +use PVE::Cluster;
> +use PVE::RADOS;
> +use PVE::RPCEnvironment;
> +
> +my $ceph_cfg_file = 'ceph.conf';
> +my $keyring_value = '/etc/pve/ceph/$cluster.$name.keyring';
> +
> +my $entity = 'client.crash';

nit: this could be inlined?

> +
> +
> +sub try_adapt_cfg {
> +    my ($cfg) = @_;
> +
> +    my $changed = 0;
> +    print("Checking whether the configuration for '$entity' needs to be updated.\n");
> +
> +    my $add_keyring = sub {
> +	print("Setting keyring path to '$keyring_value'.\n");
> +	$cfg->{$entity}->{keyring} = $keyring_value;
> +	$changed = 1;

this can be removed, you always return right after calling this sub..

> +    };
> +
> +    if (!exists($cfg->{$entity})) {
> +	print("Adding missing section for '$entity'.\n");
> +	$add_keyring->();
> +	return $changed;

and all of these can be changed to uncondtionally return 1

> +    }
> +
> +    if (exists($cfg->{$entity}->{key})) {
> +	print("Removing existing usage of key.\n");
> +	delete($cfg->{$entity}->{key});
> +	$changed = 1;
> +    }
> +
> +    if (!exists($cfg->{$entity}->{keyring})) {
> +	print("Keyring path is missing from configuration.\n");
> +	$add_keyring->();
> +	return $changed;
> +    }
> +
> +    my $current_keyring_value = $cfg->{$entity}->{keyring};
> +    if ($current_keyring_value ne $keyring_value) {
> +	print("Current keyring path differs from expected path.\n");
> +	$add_keyring->();
> +	return $changed;
> +    }
> +
> +    return $changed;

$changed only serves the purpose of returning 1 iff only the 'key'
value/entry was removed. so it could be renamed to reflect that ;)

> +}
> +
> +
> +sub main {
> +    my $rpcenv = PVE::RPCEnvironment->init('cli');
> +
> +    $rpcenv->init_request();
> +    $rpcenv->set_language($ENV{LANG});
> +    $rpcenv->set_user('root at pam');

why do we need an rpcenv here?

> +
> +    die "Not running as root. Exiting.\n" if $> != 0;
> +
> +    if (!PVE::Ceph::Tools::check_ceph_installed('ceph_bin', 1)) {
> +        print("Ceph is not installed. No action required.\n");
> +        exit 0;
> +    }
> +
> +    eval {
> +        PVE::Ceph::Tools::check_ceph_inited();
> +    };
> +    if ($@) {
> +        print("Ceph is not initialized. No action required.\n");
> +        exit 0;
> +    }
> +
> +    my $rados = eval { PVE::RADOS->new() };
> +    $@ = ''; # warnings are displayed regardless

what's this line for?

> +    my $ceph_crash_key_path = PVE::Ceph::Tools::get_config('pve_ceph_crash_key_path');
> +
> +    PVE::Cluster::cfs_lock_file($ceph_cfg_file, undef, sub {

if you let this sub return a boolean value (worked/failed)

> +        my $cfg = PVE::Cluster::cfs_read_file($ceph_cfg_file);
> +
> +	if (!defined($rados)) {
> +	    my $has_mon_config = defined(first { m/mon\..*/ } keys $cfg->%*);
> +	    if ($has_mon_config) {
> +		die "Connection to RADOS failed even though a monitor is configured.\n" .
> +		    "Please verify whether your configuration is correct.\n"
> +	    }
> +
> +	    print(
> +		"Connection to RADOS failed and no monitor is configured. ".
> +		"Assume that things are fine. No action required.\n"
> +	    );

not sure if it wouldn't be better to check mon_host here? that's the
thing actually needed to contact the mon, everything else is optional..
and even that could be moved to DNS, but we don't support that for
PVE-managed clusters..

> +	    return;
> +	}
> +
> +	my $updated_keyring = PVE::Ceph::Tools::create_or_update_crash_keyring_file($rados);
> +
> +	if ($updated_keyring) {
> +	    print("Keyring file '$ceph_crash_key_path' was updated.\n");
> +	}
> +
> +	my $changed = try_adapt_cfg($cfg);
> +
> +	if ($changed) {
> +	    print("Committing updated configuration.\n");
> +	    PVE::Cluster::cfs_write_file($ceph_cfg_file, $cfg);
> +	    print("Successfully updated configuration for 'ceph-crash.service'.\n");
> +	} else {
> +	    print("Configuration does not need to be updated.\n");
> +	}
> +    });

then you can just check the return value (undef means cfs_lock set an
error, true means it worked, false means it failed without dieing, or
whatever you want to return)

> +    if ($@) {
> +	warn("Failed to configure keyring for 'ceph-crash.service'. Error:\n");

nit: I think I'd prefer the "Error: to be on the next line with the error message.."

> +	warn("$@");
> +	exit 1;
> +    }
> +}
> +
> +main();
> +
> +0;

why?

> diff --git a/debian/postinst b/debian/postinst
> index 61b50f97..c36afa13 100755
> --- a/debian/postinst
> +++ b/debian/postinst
> @@ -82,10 +82,24 @@ EOF
>  
>  update_ceph_conf() {
>      CEPH_CONF_DIR='/etc/pve/ceph'
> +    UNIT='ceph-crash.service'
> +
> +    echo ""
>  
>      if test ! -d "${CEPH_CONF_DIR}"; then
>          mkdir -p "${CEPH_CONF_DIR}"
>      fi
> +
> +    # Don't fail in case user has "exotic" configuration where RADOS
> +    # isn't available on all nodes for some reason
> +    /usr/share/pve-manager/helpers/pve-init-ceph-crash || true
> +
> +    if systemctl -q is-enabled "$UNIT"; then

like Friedrich said, please silence this (`2>/dev/null`)

> +        echo "Restarting '$UNIT'"
> +        deb-systemd-invoke restart "$UNIT"

|| true please!

> +    fi
> +
> +    echo ""
>  }
>  
>  migrate_apt_auth_conf() {
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
>