[pbs-devel] [PATCH proxmox-backup] chunk store: fix race window between chunk stat and gc cleanup
Fabian Grünbichler
f.gruenbichler at proxmox.com
Thu Nov 6 14:56:48 CET 2025
On November 6, 2025 1:54 pm, Christian Ebner wrote:
> Sweeping of unused chunks during garbage collection checks their
> atime to distinguish between chunks being in-use and chunks no
> longer being used. While garbage collection does lock the chunk
> store by guarding its mutex before reading file stats and deleting
> unused chunks, the conditional touch did not do this before updating
> the chunks atime (thereby also checking the presence).
>
> Therefore there is a race window between the chunks metadata being
> read and the chunk being removed, but the chunk being touched
> in-between.
>
> The race is however rare, as for this to happen the chunk must be
> older than the cutoff time and not be referenced by any index file,
> otherwise the atime would be updated during phase 1 already.
>
> Fix by guarding the chunk store mutex before touching a chunk.
>
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
> pbs-datastore/src/chunk_store.rs | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index ba7618e40..d21db4a71 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -217,6 +217,7 @@ impl ChunkStore {
> assert!(self.locker.is_some());
>
> let (chunk_path, _digest_str) = self.chunk_path(digest);
> + let _lock = self.mutex.lock();
> self.cond_touch_path(&chunk_path, assert_exists)
alas, it's not as simple as that - this helper is also called while
already holding the mutex, so we need to split it up further else we
deadlock immediately on chunk insertion..
1. make the existing cond_touch_chunk private and give it _no_lock
suffix
2. make touch_chunk private and make it call the _no_lock variant
3. add a new cond_touch_chunk helper that obtains the lock and calls
_no_lock internally
4. analyze other callers to ensure nobody else calls us with the mutex
held already
and while looking at that, I realized that index_mark_used_chunks is
creating a chunk marker without holding a lock. but alas, that could
(would) then be solved with your chunk-flock series, since it's only in
the S3 case..
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
More information about the pbs-devel
mailing list