[pbs-devel] [PATCH proxmox-backup v2] GC: chunk store: fix chunk using markers cleanup

Fabian Grünbichler f.gruenbichler at proxmox.com
Wed Nov 26 09:23:50 CET 2025


On November 25, 2025 3:27 pm, Christian Ebner wrote:
> On 11/25/25 3:19 PM, Fabian Grünbichler wrote:
>> On November 25, 2025 3:00 pm, Christian Ebner wrote:
>>> Since commit 9510ef1a ("GC: assure chunk exists on s3 store when
>>> creating missing chunk marker") chunks which are referenced by
>>> an index file but do not have a local marker file are marked by a
>>> file with the `using` extension, so they are not cleaned up during
>>> phase 2 if the chunk is still present on the backend.
>>>
>>> If the chunk is however not encountered, phase 3 will see the marker
>>> and tries to clean it up, which currently however fails because
>>> it is first tried to be cleaned up from the LRU cache, the filename
>>> being converted to the chunk digest.
>>>
>>> Therefore, clean up any using marker file encountered during phase 3
>>> before any regular or bad chunk, independent from the atime.
>>>
>>> Fixes: https://forum.proxmox.com/threads/176567/post-819437
>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>> ---
>>> Changes since version 1 (thanks a lot for offlist discussion Thomas):
>>> - Cleanup using marker chunks independent from atime cutoff
>>>
>>>   pbs-datastore/src/chunk_store.rs | 14 +++++++++++++-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>>> index f53460664..7fe09b914 100644
>>> --- a/pbs-datastore/src/chunk_store.rs
>>> +++ b/pbs-datastore/src/chunk_store.rs
>>> @@ -25,6 +25,8 @@ use crate::file_formats::{
>>>   };
>>>   use crate::{DataBlob, LocalDatastoreLruCache};
>>>   
>>> +const USING_MARKER_FILENAME_EXT: &str = "using";
>>> +
>>>   /// File system based chunk store
>>>   pub struct ChunkStore {
>>>       name: String, // used for error reporting
>>> @@ -426,6 +428,16 @@ impl ChunkStore {
>>>                       drop(lock);
>>>                       continue;
>>>                   }
>>> +                if filename
>>> +                    .to_bytes()
>>> +                    .ends_with(USING_MARKER_FILENAME_EXT.as_bytes())
>>> +                {
>>> +                    unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err(|err| {
>>> +                        format_err!("unlinking chunk using marker {filename:?} failed - {err}")
>>> +                    })?;
>>> +                    drop(lock);
>>> +                    continue;
>>> +                }
>> 
>> this looks okay as a stop-gap, but isn't the actual problem that
>> 
>> .using
>> 
>> and
>> 
>> .0.bad
>> 
>> have the same length, so we end up taking a codepath using a weird "bad
>> but not bad" filename instead of skipping those markers in phase3?
> 
> but we need to clean them up at some point, otherwise the following 
> might happen:
> - chunk is in use by index file, phase 1 sets marker
> - chunk is not present on s3 object store (bad chunk), therefore not 
> seen in phase 2 and not replaced by regular marker file
> - chunk is uploaded
> - both index files are pruned
> - chunk is never cleaned up because using marker file persists.

yes, that's true, since the only purpose is to protect against cleaning
up in phase 2, they don't need to live longer than during GC.

>> in get_chunk_iterator, we skip all files that are not 64 bytes or
>> 64+len(.0.bad) bytes long, but then set the "bad" flag based on the
>> extension..
> 
> this might return the information if this was a using marker by some 
> enum variant instead of the bad boolean flag, so that can be used to 
> clearly distinguish these.

that seems cleaner, yes.




More information about the pbs-devel mailing list