[pve-devel] applied: [PATCH cluster] pmxcfs: protect CPG operations with mutex
Thomas Lamprecht
t.lamprecht at proxmox.com
Wed Sep 30 13:47:38 CEST 2020
On 30.09.20 13:21, Fabian Grünbichler wrote:
> cpg_mcast_joined (and transitively, cpg_join/leave) are not thread-safe.
> pmxcfs triggers such operations via FUSE and CPG dispatch callbacks,
> which are running in concurrent threads.
>
> accordingly, we need to protect these operations with a mutex, otherwise
> they might return CS_OK without actually doing what they were supposed
> to do (which in turn can lead to the dfsm taking a wrong turn and
> getting stuck in a supposedly short-lived state, blocking access via
> FUSE and getting whole clusters fenced).
>
> huge thanks to Alexandre Derumier for providing the initial bug report
> and quite a lot of test runs while debugging this issue.
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> ---
>
> Notes:
> we could recycle sync_mutex, but that makes it harder to reason
> about securing all code paths. it also protects non CPG operations
> as part of the sync messsage queue handling, so mixing those up is
> non-ideal.
>
> @Alexandre: this is a slightly different approach compared to the test
> build from yesterday, so if you want to test this as well it would
> be very welcome :)
>
> data/src/dfsm.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
>
applied, much thanks to all involved!
More information about the pve-devel
mailing list