[pve-devel] [PATCH v3 ceph master, ceph quincy-stable 8, pve-storage, pve-manager 00/13] Fix #4759: Configure Permissions for ceph-crash.service

Max Carrara m.carrara at proxmox.com
Wed Feb 21 14:15:55 CET 2024


On 2/21/24 12:55, Friedrich Weber wrote:
> Quickly tested the patch series on my existing Ceph Quincy cluster, did
> not encounter major issues -- the keyring was created and the Ceph
> config was rewritten accordingly. After a restart of `ceph-crash`, it
> correctly posts crashes (produced with `kill -n11 $(pidof ceph-osd)`)
> again and does not write any potentially misleading messages to the
> journal. Nice!
> 
> Didn't have time yet to test these patches when setting up a new
> cluster, but I'll try to do so this week and report back.
> 

Thank you very much for the testing and the feedback!

> Two minor things I've noticed so far:
> 
> - the `ceph-crash` service does not restart after installing the patched
> ceph-base package, so the reordering done by patches 02+04 does not take
> effect immediately: ceph-crash posts crash logs just fine, but logs to
> the journal that it can't find a keyring. After a restart of ceph-crash,
> the patch takes effect, so only a tiny inconvenience, but still: Not
> sure if restarting the service is something we'd want to do in a
> postinst -- is this an acceptable thing to do in a postinst?

Initially the service was being restarted, but that's omitted in the new
hook, as Fabian and I had noticed that `ceph-crash` just checks for its
expected keys after its waiting period again anyways. I had unfortunately
forgotten to put that into the changelog of the postinst hook stuff -
mea culpa!

I think restarting the service would be necessary then in order to apply
the new sequence which keys are checked in, as that's hard-coded in
`ceph-crash`.

It certainly should be acceptable (as we already do it in some instances),
as long as we restart it only if the service is enabled. That was part
of the old BASH function anyway - I don't think there's any harm in adding
it back (either in BASH or Perl).

> 
> - Might there be issues in a mixed-version cluster scenario, so if some
> node A already has an updated pve-storage package (patches 05-10), but
> node B doesn't yet? One thing I noticed is that node A will add the
> [client.crash] section, but node B may remove it again when it needs to
> rewrite the Ceph config (e.g. when creating a monitor). I don't find
> this particular issue too concerning, as hopefully node B will be
> updated eventually as well and reinstate the [client.crash] section. But
> I wonder if there could be other more serious issues?

The scenario you mentioned might indeed happen somehow, but once all
nodes are updated - even if the config has been changed inbetween updates -
the '[client.crash]' section should definitely exist.

One issue that's been fixed by moving things to the Perl helper is that
simultaneous updates could potentially modify 'ceph.conf' at the same time
- the Perl helper now locks the file on pmxcfs, so that cannot happen anymore.

I cannot think of any other scenario at the moment.

In any case, even if *somehow* 'ceph.conf' ends up not containing the section
or the keyring file ends up missing, the helper script will be available
after the update has been performed, so it's possible to just run it again
manually to adapt the config.

That being said, this reminds me that the '[ceph.crash]' section, the location
of the keyring file, etc. should probably be in our docs as well, so I will
send in a follow-up series for that (unless this series ends up needing a v4,
then I'll include it there).

Thanks again for the feedback and the tests you ran!

> 
> On 16/02/2024 15:56, Max Carrara wrote:
>> This marks version 03 of the patch series "Fix #4759: Configure
>> Permissions for ceph-crash.service". Older versions can be found below.
>>
>> Notable changes since v2
>> ------------------------
>>
>>   * The 'ceph.conf' parser in pve-storage is now equivalent to Ceph's
>>     and even supports continued lines
>>   * The addition of the '/etc/pve/ceph' directory has been moved into a
>>     separate patch in order to preserve the context of its purpose in
>>     the git history
>>   * The debian `postinst` hook for pve-manager is now version-guarded
>>     and uses a separate Perl helper script instead of doing everything
>>     in BASH
>>
>>
>> Older Versions
>> --------------
>>
>> v1: https://lists.proxmox.com/pipermail/pve-devel/2024-January/061546.html
>> v2: https://lists.proxmox.com/pipermail/pve-devel/2024-February/061646.html
>>
>>
>>
>> ceph (master):
>>
>> Max Carrara (2):
>>   debian: add patch to fix ceph crash dir permissions in postinst hook
>>   patches: add patch that reorders clients used by ceph-crash
>>
>>  ...ly-adjust-permissions-of-var-lib-cep.patch | 54 +++++++++++++++++++
>>  ...h-crash-change-order-of-client-names.patch | 30 +++++++++++
>>  patches/series                                |  2 +
>>  3 files changed, 86 insertions(+)
>>  create mode 100644 patches/0016-debian-recursively-adjust-permissions-of-var-lib-cep.patch
>>  create mode 100644 patches/0017-ceph-crash-change-order-of-client-names.patch
>>
>>
>> ceph (quincy-stable-8):
>>
>> Max Carrara (2):
>>   debian: add patch to fix ceph crash dir permissions in postinst hook
>>   patches: add patch that reorders clients used by ceph-crash
>>
>>  ...ly-adjust-permissions-of-var-lib-cep.patch | 54 +++++++++++++++++++
>>  ...h-crash-change-order-of-client-names.patch | 30 +++++++++++
>>  patches/series                                |  2 +
>>  3 files changed, 86 insertions(+)
>>  create mode 100644 patches/0025-debian-recursively-adjust-permissions-of-var-lib-cep.patch
>>  create mode 100644 patches/0026-ceph-crash-change-order-of-client-names.patch
>>
>>
>> pve-storage:
>>
>> Max Carrara (6):
>>   cephconfig: align our parser more with Ceph's parser
>>   cephconfig: support line-continuations in parser
>>   cephconfig: allow writing arbitrary sections
>>   cephconfig: change code style inside config writer
>>   cephconfig: change order of written sections
>>   cephconfig: remove leading whitespace on write to Ceph config
>>
>>  src/PVE/CephConfig.pm | 80 +++++++++++++++++++++++++++++++------------
>>  1 file changed, 58 insertions(+), 22 deletions(-)
>>
>>
>> pve-manager:
>>
>> Max Carrara (3):
>>   ceph: introduce '/etc/pve/ceph'
>>   fix #4759: ceph: configure ceph-crash.service and its key
>>   bin/make: gather helper scripts in separate variable
>>
>>  PVE/API2/Ceph.pm        |   5 ++
>>  PVE/API2/Ceph/MON.pm    |   8 ++++
>>  PVE/Ceph/Tools.pm       |  47 +++++++++++++++++-
>>  bin/Makefile            |   6 ++-
>>  bin/pve-init-ceph-crash | 104 ++++++++++++++++++++++++++++++++++++++++
>>  debian/postinst         |  16 +++++++
>>  6 files changed, 183 insertions(+), 3 deletions(-)
>>  create mode 100755 bin/pve-init-ceph-crash
>>





More information about the pve-devel mailing list