[pbs-devel] [PATCH proxmox-backup v3 18/23] GC: lock chunk marker before cleanup in phase 3 on s3 backends
Christian Ebner
c.ebner at proxmox.com
Thu Nov 6 12:49:08 CET 2025
On 11/6/25 10:37 AM, Fabian Grünbichler wrote:
> On November 5, 2025 1:22 pm, Christian Ebner wrote:
>> To make sure there is no race between atime check and deletion with
>> possible re-insertion.
>
> could you expand on this?
Might have mixed up phase 2 (which in the newest version no longer holds
the mutex lock) with phase 3 (which does so).
>
> sweep_unused_chunks holds the chunk store mutex for the full processing
> of each chunk it iterates over..
>
> so we have for each chunk:
>
> - obtain mutex
> - fstat
> - call cond_sweep_chunk
> - lock chunk
> - fstat again
> - drop mutex
> - remove from cache (which re-obtains mutex)
>
> shouldn't we protect this race by obtaining the mutex when conditionally
> touching the chunk in `insert_chunk_cached()`, or even in> `datastore.cond_touch_chunk()`? then there is no way for a chunk
> insertion to update the atime here.. AFAICT the latter would also close
> a race with pull syncing, which touches chunks while they are referenced
> by tmp indices, which means those touches are not protected by the mutex
> or the rest of the GC logic?
Right, that case is not protected... this warrants some a really close
look though, as this will have a significant performance impact if not
done right.
>> By only acquiring the file lock if the chunk marker would be removed
>> and double stating, the file locking penalty is avoided for the other
>> cases.
>>
>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>> ---
>> changes since version 2:
>> - no changes
>>
>> pbs-datastore/src/chunk_store.rs | 28 +++++++++++++++++++++++++++-
>> pbs-datastore/src/datastore.rs | 2 ++
>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 49687b2fa..08519fe2b 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -5,6 +5,7 @@ use std::sync::{Arc, Mutex};
>> use std::time::Duration;
>>
>> use anyhow::{bail, format_err, Context, Error};
>> +use hex::FromHex;
>> use tracing::{info, warn};
>>
>> use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>> @@ -22,7 +23,7 @@ use crate::data_blob::DataChunkBuilder;
>> use crate::file_formats::{
>> COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>> };
>> -use crate::DataBlob;
>> +use crate::{DataBlob, LocalDatastoreLruCache};
>>
>> /// File system based chunk store
>> pub struct ChunkStore {
>> @@ -366,6 +367,7 @@ impl ChunkStore {
>> min_atime: i64,
>> status: &mut GarbageCollectionStatus,
>> worker: &dyn WorkerTaskContext,
>> + cache: Option<&LocalDatastoreLruCache>,
>> ) -> Result<(), Error> {
>> // unwrap: only `None` in unit tests
>> assert!(self.locker.is_some());
>> @@ -419,6 +421,30 @@ impl ChunkStore {
>> bad,
>> status,
>> || {
>
> basically we have two callbacks here:
> - remove chunk/chunk marker (called for all bad chunks or non S3 case)
> - remove chunk via cache (called for non-bad S3 chunks)
>
>> + if let Some(cache) = cache {
>> + // never lock bad chunks
>> + if filename.to_bytes().len() == 64 {
>
> this is already passed in as bool ;)Right, can be checked by that directly.
>
>> + let digest = <[u8; 32]>::from_hex(filename.to_bytes())?;
>> + match self.lock_chunk(&digest, Duration::from_secs(0)) {
>
> so this is basically a "skip chunk if currently being uploaded" check,
> right? might warrant a comment, else this reads like a deadlock waiting
> to happen..
Yes, will add a comment here. You are right, this *must* never be blocking.
> in phase2 we obtain the chunk lock for all chunks on the S3 side outside
> of the remove callback, here we do it inside - is there a reason for that?
That will be actually adapted to only acquire the per-chunk file lock,
but the ordering of the patches does not reflect this anymore, might
reorder this one to better reflect that.
> chunks ending up in this state here should be rather rare (compared to
> chunks existing on S3 being garbage collected), right?
Yes, this code path *should* never be taken, unless the chunk vanished
from the s3 object store because of some other interaction. Basically it
just cleans up the dangling chunk marker files.
> but even if we keep this, the whole callback could become (AFAICT):
>
>
> || {
> // non-bad S3 chunks need to be removed via cache
> if let Some(cache) = cache {
> if !bad {
> let digest = <[u8; 32]>::from_hex(filename.to_bytes())?;
>
> // unless there is a concurrent upload pending
> if let Ok(_guard) =
> self.lock_chunk(&digest, Duration::from_secs(0))
> {
> drop(lock);
> cache.remove(&digest)?;
> }
>
> return Ok(());
> }
> }
>
> // bad or local chunks
> unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err(
> |err| {
> format_err!(
> "unlinking chunk {filename:?} failed on store '{}' - {err}",
> self.name,
> )
> },
> )
> },
>
>> + Ok(_guard) => {
>> + // don't remove if changed since locking
>> + match fstatat(
>> + Some(dirfd),
>> + filename,
>> + nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>> + ) {
>> + Ok(stat) if stat.st_atime < min_atime => {
>> + let _ = cache.remove(&digest);
>> + return Ok(());
>> + }
>> + _ => return Ok(()),
>> + }
>> + }
>> + Err(_) => return Ok(()),
>> + }
>> + }
>> + }
>> +
>> unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err(
>> |err| {
>> format_err!(
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 8b7333d80..df344974a 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1761,6 +1761,7 @@ impl DataStore {
>> min_atime,
>> &mut tmp_gc_status,
>> worker,
>> + self.cache(),
>> )?;
>> } else {
>> self.inner.chunk_store.sweep_unused_chunks(
>> @@ -1768,6 +1769,7 @@ impl DataStore {
>> min_atime,
>> &mut gc_status,
>> worker,
>> + None,
>> )?;
>> }
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
More information about the pbs-devel
mailing list