[pve-devel] [PATCH pve-manager 8/8] fix #4759: debian/postinst: configure ceph-crash.service and its key

Max Carrara m.carrara at proxmox.com
Thu Feb 1 14:54:37 CET 2024


On 1/31/24 14:15, Fabian Grünbichler wrote:
> On January 30, 2024 7:40 pm, Max Carrara wrote:
>> This commit adds the `set_ceph_crash_conf` function, which dynamically
>> adapts the host's Ceph configuration in order to allow the Ceph crash
>> module's daemon to run without elevated privileges.
>>
>> This adaptation is only performed if:
>>  * Ceph is installed
>>  * Ceph is configured ('/etc/pve/ceph.conf' exists)
>>  * Connection to RADOS is successful
>>
>> If the above conditions are met, the function will ensure that:
>>  * Ceph possesses a key named 'client.crash'
>>  * The key is saved to '/etc/pve/ceph/ceph.client.crash.keyring'
>>  * A section for 'client.crash' exists in '/etc/pve/ceph.conf'
>>  * The 'client.crash' section has a key named 'keyring' which
>>    references '/etc/pve/ceph/ceph.client.crash.keyring'
>>
>> Furthermore, if a key named 'client.crash' already exists within the
>> cluster, it shall be reused and not regenerated. Also, the
>> configuration is not altered if the conditions above are already met.
>>
>> This way the keyring file is available as read-only in
>> '/etc/pve/ceph/' for the `www-data` group (due to how pmxcfs works).
>> Because the `ceph` user has been made part of said `www-data` group
>> [0], it may access the file without requiring any additional
>> privileges.
>>
>> Thus, the configuration for the Ceph crash daemon is safely adapted as
>> expected by PVE tooling and also shared via pmxcfs across one's
>> cluster.
>>
>> [0]: https://git.proxmox.com/?p=ceph.git;a=commitdiff;h=f72c698a55905d93e9a0b7b95674616547deba8a
>>
>> Signed-off-by: Max Carrara <m.carrara at proxmox.com>
>> ---
>>  debian/postinst | 109 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 109 insertions(+)
>>
>> diff --git a/debian/postinst b/debian/postinst
>> index 00d5f2cc..8d2a8c4b 100755
>> --- a/debian/postinst
>> +++ b/debian/postinst
>> @@ -110,6 +110,114 @@ migrate_apt_auth_conf() {
>>      fi
>>  }
>>  
>> +set_ceph_crash_conf() {
>> +    PVE_CEPH_CONFFILE='/etc/pve/ceph.conf'
>> +    PVE_CEPH_CONFDIR='/etc/pve/ceph'
>> +    PVE_CEPH_CRASH_KEY="${PVE_CEPH_CONFDIR}/ceph.client.crash.keyring"
>> +    PVE_CEPH_CRASH_KEY_REF="${PVE_CEPH_CONFDIR}/\$cluster.\$name.keyring"
>> +
>> +    # ceph isn't installed -> nothing to do
>> +    if ! which ceph > /dev/null 2>&1; then
>> +        return 0
>> +    fi
>> +
>> +    # ceph isn't configured -> nothing to do
>> +    if test ! -f "${PVE_CEPH_CONFFILE}"; then
>> +        return 0
>> +    fi
>> +
>> +    CEPH_AUTH_RES="$(ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' 2>&1 || true)"
>> +
>> +    # ceph is installed and possibly configured, but no connection to RADOS
>> +    # -> assume no monitor was created, nothing to do
>> +    if echo "${CEPH_AUTH_RES}" | grep -i -q 'RADOS object not found'; then
>> +        return 0
>> +    fi
> 
> the stuff after this point basically duplicates a lot of things from
> pveceph in shell.. wouldn't it be easier to have a pveceph reinit or
> similar command (or a parameter to an existing one) and call that here?

I wasn't really sure whether tinkering with ceph-crash would warrant a
separate subcommand or not, so I decided to stick with the postinst hook,
as the whole config is set up anyway for new MONs in the future.

Though, perhaps a subcommand that checks whether the entire config is in
place the way we expect it would be nice in general? E.g. whether
'/etc/pve/priv' exists and the '.keyring' files are there (and can be used
to authenticate), '/etc/pve/ceph.conf' contains what we expect, etc.

Places where the config or some files are missing / messed up could then
be spotted much easier and quicker - maybe that would also relieve some
pressure for the support staff.

Something like `pveceph verify-config`, perhaps? Eventually with a `--repair`
flag too, maybe.

Implementing this would possibly also warrant a slight overhaul / refactor
of our Ceph-related perl code, so certain strings, expected conf values, etc.
aren't scattered around as much anymore. It's not too bad IMO, mind you,
but there's no hurt in cleaning some things up along the way.

I would gladly work on that, but probably in another patch series - so I would
prefer to keep the postinst hook around for now and fix any (yet to be) known
issues it has. That would at least fix #4759 quicker and reduce the spam
`ceph-crash` causes in the systemd journal every 10 minutes.

> 
> or, for even less coupling (and thus chance of things going wrong and
> interrupting the upgrade), include a check somewhere in the ceph status
> code path and just add a warning if the key is not configured, with a
> hint what command to run/button to click to do the setup?
> 
>> +    SECTION_RE='^\[\S+\]$'
>> +    CRASH_SECTION_RE='^\[client\.crash\]$'
>> +
> 
>> [..]
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 





More information about the pve-devel mailing list