[pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs
Wolfgang Bumiller
w.bumiller at proxmox.com
Tue Mar 22 10:38:42 CET 2022
On Fri, Mar 18, 2022 at 03:06:50PM +0100, Stefan Sterz wrote:
> This RFC overhauls the locking mechanism for snapshots and backup
> groups in pbs-datastore and wherever groups and snapshots are
> accessed. Sending this as an RFC in case I missed places where
> locking occurs on groups, snapshots or manifests.
>
> The first patch in this series moves most of the locking into new
> functions that are part of the `datastore` trait. These functions
> create a mirror of the directory structure under
> `/run/proxmox-backup/locks` that is used for locking. The second
> patch adds one more locking function that enables shared locks on
> groups. This functionality has not been needed so far, but might be
> useful in the future. I am unsure whether it should be included.
>
> The third patch moves the `/run/proxmox-backup/locks` path into a
> constant in order to be able to switch the lock location more easily
> latter. Sending this as its own patch as it also affects manifest
> locking in a minor way, unlike the previous patches.
>
> Lastly, patch four and five fix issues that are locking related and
> currently present in the codebase. For patch four, see my comment
> there. Patch five resolves a race condition when two clients try to
> change the owner of a data store via the API. Patch five depends on
> patch one, but patch four could be applied independently.
So I think we'll move ahead with this series with minor changes and
possibly extend it a little further.
We had an off-list discussion and basically came to the following
conclusion:
Since this series moves the locks into a tmpfs, adding `stat()` calls
on the locked fd and the path afterwards should be cheap enough. By
comparing the inodes we can make sure the file is still the same one and
we didn't race against a deletion. This way we don't need to leave the
files lying around.
>From what I can tell you currently use directories for locks. But since
the locks aren't actually the backup group directory anymore I think
they should be files instead.
Also, since we'd like to actually clean up the locks, the lock paths
shouldn't be nested directories as they are now, but flat files (still
based on the path but escaped, otherwise cleaning up would become even
more complex ;-) ).
Other ideas which floated around:
- A lock daemon (or simply a fuse implementation where 'unlink' cancels
pending flocks). (This could be extended in various ways for async
locking, timeouts, etc.)
- A "hashmap lock file": An in-memory file filled with robust-futex
where the group name simply gets hashed to get to a positioin in the
file. Hash clashes aren't an issue since all they'd do is cause
multiple groups to be guarded by the same lock, which wouldn't
technically be wrong.
But the tmpfs+stat approach seems good enough to - at least for now -
not really bother with these other options.
I also think patch 3 should be merged into 1 (the constant should be
there right away), and patch 2 (dead_code) can just be dropped for now.
As for the manifest lock (patch 4), we could probably switch to
doing a double-stat() there as well instead.
More information about the pbs-devel
mailing list