[pve-devel] applied: [PATCH cluster] pmxcfs: protect CPG operations with mutex

Thomas Lamprecht t.lamprecht at proxmox.com
Wed Sep 30 13:47:38 CEST 2020


On 30.09.20 13:21, Fabian Grünbichler wrote:
> cpg_mcast_joined (and transitively, cpg_join/leave) are not thread-safe.
> pmxcfs triggers such operations via FUSE and CPG dispatch callbacks,
> which are running in concurrent threads.
> 
> accordingly, we need to protect these operations with a mutex, otherwise
> they might return CS_OK without actually doing what they were supposed
> to do (which in turn can lead to the dfsm taking a wrong turn and
> getting stuck in a supposedly short-lived state, blocking access via
> FUSE and getting whole clusters fenced).
> 
> huge thanks to Alexandre Derumier for providing the initial bug report
> and quite a lot of test runs while debugging this issue.
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> ---
> 
> Notes:
>     we could recycle sync_mutex, but that makes it harder to reason
>     about securing all code paths. it also protects non CPG operations
>     as part of the sync messsage queue handling, so mixing those up is
>     non-ideal.
> 
>     @Alexandre: this is a slightly different approach compared to the test
>     build from yesterday, so if you want to test this as well it would
>     be very welcome :)
> 
>  data/src/dfsm.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
>

applied, much thanks to all involved!






More information about the pve-devel mailing list