[pve-devel] [PATCH v5 pve-storage, pve-manager 00/11] Fix #4759: Configure Permissions for ceph-crash.service

Max Carrara m.carrara at proxmox.com
Tue Apr 9 12:28:46 CEST 2024


On Tue Apr 9, 2024 at 11:48 AM CEST, Maximiliano Sandoval wrote:
>
> Max Carrara <m.carrara at proxmox.com> writes:
>
> > Fix #4759: Configure Permissions for ceph-crash.service - Version 5
> > ===================================================================
>
> I tested this patch series on a testing cluster updated to
> no-subscription with ceph-base 18.2.2-pve1. For the purposes of testing
> I removed the version check against 0.0.0.
>
> The following things were working as expected:
>
>  - There are no more ceph-crash errors in the journal
>  - /etc/pve/ceph.conf contains:
>    ```
>    [client.crash]
> 	keyring = /etc/pve/ceph/$cluster.$name.keyring
>    ```
>  - The new keyring is the right place at
>    ```
>    # ls /etc/pve/ceph
>    ceph.client.crash.keyring
>    ```
>  - After a few minutes the crash reports at /var/lib/ceph/crash/ were
>    moved to /var/lib/ceph/crash/posted.

Thanks a lot for testing this, much appreciated!

>
> One thing that was broken is running the ceph-crash binary directly:
>
> ```
> # ceph-crash
> INFO:ceph-crash:pinging cluster to exercise our key
> 2024-04-09T11:42:31.591+0200 7009fca926c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
> 2024-04-09T11:42:31.595+0200 7009fca926c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
> 2024-04-09T11:42:31.595+0200 7009fca926c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
> 2024-04-09T11:42:31.595+0200 7009fca926c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
> 2024-04-09T11:42:31.595+0200 7009fca926c0 -1 monclient: keyring not found
> [errno 13] RADOS permission denied (error connecting to the cluster)

That's not actually "broken" (even though it looks like it, tbh) -
that's just how Ceph rolls in this case ...

On startup `ceph-crash` will first check if the cluster is even
reachable [0]. I'm not sure why it resorts to looking up the admin
keyring first.

> INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s

Here it does actually then monitor the crash dir as expected, so it
works just fine.

The usual errors that appear every 10 minutes are otherwise silenced by
a patch on our side [1] (which were the most annoying kinds of errors
anyway).

> ```


[0]: https://git.proxmox.com/?p=ceph.git;a=blob;f=ceph/src/ceph-crash.in;h=0e02837fadd4dde8abd66985b485836402e10a37;hb=HEAD#l131
[1]: https://git.proxmox.com/?p=ceph.git;a=blob;f=patches/0017-ceph-crash-change-order-of-client-names.patch;h=8131fced55f3e4c757bd22c16539070f83480a19;hb=HEAD

>
> --
> Maximiliano
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel





More information about the pve-devel mailing list