[pve-devel] [PATCH ceph master v1] pybind/rbd: disable on_progress callbacks to prevent MGR segfaults

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Sep 10 09:00:03 CEST 2025


On September 9, 2025 7:05 pm, Max R. Carrara wrote:
> Currently, *all* MGRs collectively segfault on Ceph v19.2.3 running on
> Debian Trixie if a client requests the removal of an RBD image from
> the RBD trash (#6635 [0]).
> 
> After a lot of investigation, the cause of this still isn't clear to
> me; the most likely culprit are some internal changes to Python
> sub-interpreters that happened between Python versions 3.12 and 3.13.
> 
> What leads me to this conclusion is the following:
>  1. A user on our forum noted [0] that the issue disappeared as soon as
>     they set up a Ceph MGR inside a Debian Bookworm VM. Bookworm has
>     Python version 3.11, before any substantial changes to
>     sub-interpreters [1][2] were made.

did you try with stock Debian Trixie packages (the Ceph version is still
18.2 there, which might help narrowing it down)?

in any case, it would be good to bring this issue to upstream's
attention as well!
 
>  2. There is an upstream issue [3] regarding another segfault during
>     MGR startup. The author concluded that this problem is related to
>     sub-interpreters and opened another issue [4] on Python's issue
>     tracker that goes into more detail.
> 
>     Even though this is for a completely different code path, it shows
>     that issues related to sub-interpreters are popping up elsewhere
>     at the very least.

did you try reproducing that one? it seems it requires an optional
ceph-mgr plugin that we have packaged as well, so should be fairly
straight-forward..




More information about the pve-devel mailing list