[pve-devel] [PATCH master ceph, quincy-stable-8 ceph, pve-storage, pve-manager 0/8] Fix #4759: Configure Permissions for ceph-crash.service

Fabian Grünbichler f.gruenbichler at proxmox.com
Thu Feb 1 14:35:22 CET 2024


On January 31, 2024 3:22 pm, Friedrich Weber wrote:
> Also, looks like every time ceph-crash posts a report, the syslog reads:
> 
> Jan 31 15:02:30 ceph1 ceph-crash[110939]: WARNING:ceph-crash:post
> /var/lib/ceph/crash/2024-01-31T13:53:16.419342Z_1b5a078a-f665-4fcd-abd5-9bf602048d1f
> as client.crash.ceph1 failed: 2024-01-31T15:02:30.105+0100 7f10bf7ae6c0
> -1 auth: unable to find a keyring on
> /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied
> Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100
> 7f10bf7ae6c0 -1 auth: unable to find a keyring on
> /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied
> Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100
> 7f10bf7ae6c0 -1 auth: unable to find a keyring on
> /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied
> Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100
> 7f10bf7ae6c0 -1 auth: unable to find a keyring on
> /etc/pve/priv/ceph.client.crash.ceph1.keyring: (13) Permission denied
> Jan 31 15:02:30 ceph1 ceph-crash[110939]: 2024-01-31T15:02:30.105+0100
> 7f10bf7ae6c0 -1 monclient: keyring not found
> Jan 31 15:02:30 ceph1 ceph-crash[110939]: [errno 13] RADOS permission
> denied (error connecting to the cluster)
> 
> I remember you mentioned this before. Do I remember correctly there is
> no easy way to prevent these messages? Having them appear only when a
> crash is posted is certainly better than every 10 minutes, but they are
> a bit misleading as they very much look like an error that needs attention.

so I did a few more experiments.

ceph-crash does two things

A) it executes `ceph -s` without specifying a client name, which means
that part will always try to use the `client.admin` config/keyring
B) it tries to post crashes if they exist, using the keys
`client.crash.$HOST`, `client.crash`, `client.admin`

A happens at startup to "exercise the key", irrespective of crash files
existing or not. we'd need to patch ceph-crash once we settled which
client name to use to avoid it.

B happens for every crash, once posting worked the other keyrings are
not tried again for that particular crash, but will for the next.

this means to avoid warnings altogether, we'd need to make the first
entry in auth_names work or patch the `auth_names` part of the
ceph-crash binary.

I played around a bit and it seems we could do the following:
- change the [client] section in our config to only affect
  [client.admin] (simple renaming is enough, all `ceph` invocations
  without `-n` or `-i` should continue to work as before, since
  "client.admin" is the default `-n` value)
- generate (on each node) a `client.crash.$HOSTNAME` keyring with crash
  profile and store it in /etc/ceph/ceph.client.crash.$HOSTNAME

ceph-crash will then (at least for crash posting purposes) invoke `ceph
-n client.crash.$HOSTNAME` first, which will pick up that keyring since
`/etc/ceph/$cluster.$name.keyring` is part of the default value(s) for
the client keyring. this doesn't work without modifying our ceph.conf
since the current global "client.keyring" setting overrides the built-in
defaults for *all* invocations, even for `ceph -n XXX`.

using the current approach with "client.crash" and a key on pmxcfs also
works, to silence the warnings we could then patch ceph-crash to use
that key (/client name) for `ceph -s` and remove the
`client.crash.$HOSTNAME` from auth_names. but I assume since that comes
first, that upstream actually expects people to use that keyring, the
rest are just fallbacks, so we'd need to watch for regressions when
pulling in updates.




More information about the pve-devel mailing list