[pbs-devel] [PATCH proxmox-backup] chunk store: fix race window between chunk stat and gc cleanup

Fabian Grünbichler f.gruenbichler at proxmox.com
Thu Nov 6 14:56:48 CET 2025


On November 6, 2025 1:54 pm, Christian Ebner wrote:
> Sweeping of unused chunks during garbage collection checks their
> atime to distinguish between chunks being in-use and chunks no
> longer being used. While garbage collection does lock the chunk
> store by guarding its mutex before reading file stats and deleting
> unused chunks, the conditional touch did not do this before updating
> the chunks atime (thereby also checking the presence).
> 
> Therefore there is a race window between the chunks metadata being
> read and the chunk being removed, but the chunk being touched
> in-between.
> 
> The race is however rare, as for this to happen the chunk must be
> older than the cutoff time and not be referenced by any index file,
> otherwise the atime would be updated during phase 1 already.
> 
> Fix by guarding the chunk store mutex before touching a chunk.
> 
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index ba7618e40..d21db4a71 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -217,6 +217,7 @@ impl ChunkStore {
>          assert!(self.locker.is_some());
>  
>          let (chunk_path, _digest_str) = self.chunk_path(digest);
> +        let _lock = self.mutex.lock();
>          self.cond_touch_path(&chunk_path, assert_exists)

alas, it's not as simple as that - this helper is also called while
already holding the mutex, so we need to split it up further else we
deadlock immediately on chunk insertion..

1. make the existing cond_touch_chunk private and give it _no_lock
suffix
2. make touch_chunk private and make it call the _no_lock variant
3. add a new cond_touch_chunk helper that obtains the lock and calls
_no_lock internally
4. analyze other callers to ensure nobody else calls us with the mutex
held already

and while looking at that, I realized that index_mark_used_chunks is
creating a chunk marker without holding a lock. but alas, that could
(would) then be solved with your chunk-flock series, since it's only in
the S3 case..

>      }
>  
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




More information about the pbs-devel mailing list