[pbs-devel] [PATCH proxmox-backup 5/6] datastore: verify: evict corrupt chunks from in-memory LRU cache

Thu Oct 30 11:56:46 CET 2025

On 10/27/25 12:53 PM, Christian Ebner wrote:
> On 10/27/25 11:59 AM, Fabian Grünbichler wrote:
>> On October 16, 2025 3:18 pm, Christian Ebner wrote:
>>> Chunks detected as corrupt have been renamed on both, the S3 backend
>>> and the local datastore cache, but not evicted from the in-memory
>>> cache containing the LRU chunk digests. This can lead to the chunks
>>> being considered as already present if their digest is still cached,
>>> and therefore not being re-inserted in the local store cache and S3
>>> backend on backup upload.
>>>
>>> Fix this by not only renaming the local datastore's chunk marker
>>> file, but also removing it from the in-memory cache while holding the
>>> chunk store mutex lock to exclude interference from concurrent chunk
>>> inserts.
>>>
>>> Reported-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>> ---
>>>   pbs-datastore/src/datastore.rs | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/ 
>>> datastore.rs
>>> index a7ea8fd96..c551679e3 100644
>>> --- a/pbs-datastore/src/datastore.rs
>>> +++ b/pbs-datastore/src/datastore.rs
>>> @@ -2467,10 +2467,18 @@ impl DataStore {
>>>           let _lock = self.inner.chunk_store.mutex().lock().unwrap();
>>> -        match std::fs::rename(&path, &new_path) {
>>> +        let result = match std::fs::rename(&path, &new_path) {
>>>               Ok(_) => Ok(Some(format!("corrupted chunk renamed to 
>>> {new_path:?}"))),
>>>               Err(err) if err.kind() == std::io::ErrorKind::NotFound 
>>> => Ok(None),
>>>               Err(err) => bail!("could not rename corrupted chunk 
>>> {path:?} - {err}"),
>>> +        };
>>> +
>>> +        if let Some(cache) = self.cache() {
>>> +            // Requiremets for calling the unsafe method are met, 
>>> since snapshots referencing the
>>> +            // corrupt chunk are to be considered corrupt. Ignore 
>>> the error due to the missing file.
>>> +            let _ = unsafe { cache.remove(digest) };
>>>           }
>>
>> not exactly related to this patch, but something I noticed while
>> reviewing it:
>>
>> how do we protect against concurrent actions on the same chunk between
>> - verification detects chunk as corrupt
>> - rename on S3 level
>> - lock + rename and remove on chunk store/cache level
>>
>> let's say to verification tasks and a backup writer race, all involving
>> the same chunk:
>>
>> - verification A and B detect chunk as corrupt
>> - verification A renames it on the S3 level
>> - verification A renames it on the chunk store level and removes it from
>>    the cache
>> - backup writer uploads it to S3
>> - verification B renames it on the S3 level
>>
>> and now either:
>> - backup writer inserts it into the cache
>> - verification B renames it on the chunk store level and removes it from
>>    the cache
>>
>> which means the backup writer creates a corrupt, but not detected as
>> such backup
>>
>> or
>> - verification B attempts to rename it on the chunk store level
>>    (ENOTFOUND) and removes it from the cache
>> - backup writer inserts it into the cache
>>
>> which means we just broke cache coherency - the chunk exists in the
>> chunk store and cache, but not on S3 (anymore)
>>
>>
>> or another racey variant:
>> - verification A and B detect chunk as corrupt
>> - verification A renames it on the S3 level
>> - verification B renames it on the S3 level but fails because it doesn't
>>    exist there??
>>
>> I think we might need another synchronization mechanism for verification
>> involving S3..
> 
> only option I see here is if the rename (copyObject) can only happen 
> conditionally, e.g. via the x-amz-copy-source-if-unmodified-since header
> 
> Which might however not be available for all providers.

As discussed off-list, the best way forward here is to implement a 
per-chunk file locking mechanism for datastores with s3 backend which 
guarantee consistency for s3 <-> local store cache. That also is 
de-coupled from provider specific implementation details and future 
prove as no further assumptions have to been met. This locking mechanism 
can then also be used instead of the marker file for concurrent upload 
races and upload <-> GC races in 
https://lore.proxmox.com/pbs-devel/20251015164008.975591-1-c.ebner@proxmox.com/T/