[pbs-devel] [PATCH proxmox-backup v4 13/14] GC: fix deadlock for cache eviction and garbage collection
Christian Ebner
c.ebner at proxmox.com
Mon Nov 10 12:56:26 CET 2025
When inserting a chunk via the local datastore cache, first the
chunk is inserted into the chunk store and then into the in-memory
AsyncLruCache. If the cache capacity is reached, the AsycLruCache
will execute a callback on the evicted cache node, which in case of
the local datastore cache performs a clear chunk call. For this
codepath, the AsyncLruCache is guarded by locking a mutex to get
exclusive access on the cache, and then the chunk store mutex guard
is acquired for safe clearing of the chunk.
Garbage collection however tries the opposite if a chunk is no longer
present and should be cleaned up. It first guards the chunk store
mutex, only to then try and remove the chunk from the local chunk
store and the AsyncLruCache, the latter trying to guarantee
exclusive access by guarding its own mutex.
This therefore can result in a deadlock, further locking the whole
chunk store.
Fix this by guarding the chunk store within the remove_chunk() helper
method on the ChunkStore, not acquiring the lock in the garbage
collection itself for this code path (still guarded by the per-chunk
file lock). By this the order of locking is the same as on cache
eviction.
Reported-by: https://forum.proxmox.com/threads/174878/
Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 4 +++-
pbs-datastore/src/datastore.rs | 1 -
pbs-datastore/src/local_datastore_lru_cache.rs | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 879ac4923..273b17d5f 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -430,6 +430,8 @@ impl ChunkStore {
if let Ok(_guard) =
self.lock_chunk(&digest, Duration::from_secs(0))
{
+ // still protected by per-chunk file lock
+ drop(lock);
cache.remove(&digest)?;
}
@@ -450,7 +452,6 @@ impl ChunkStore {
)?;
}
}
- drop(lock);
}
Ok(())
@@ -710,6 +711,7 @@ impl ChunkStore {
/// chunks by verifications and chunk inserts by backups.
pub(crate) fn remove_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, _digest_str) = self.chunk_path(digest);
+ let _lock = self.mutex.lock();
std::fs::remove_file(chunk_path).map_err(Error::from)
}
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 354caeae1..0a86aaede 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1710,7 +1710,6 @@ impl DataStore {
&mut gc_status,
|| {
if let Some(cache) = self.cache() {
- let _guard = self.inner.chunk_store.mutex().lock().unwrap();
cache.remove(&digest)?;
}
delete_list.push((content.key, _chunk_guard));
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index 7b9d8caae..74647cdb2 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -42,7 +42,8 @@ impl LocalDatastoreLruCache {
/// Remove a chunk from the local datastore cache.
///
/// Callers to this method must assure that:
- /// - no concurrent insert is being performed, the chunk store's mutex must be held.
+ /// - the chunk store's mutex is not being held.
+ /// - no concurrent insert is being performed, the per-chunk file lock must be held.
/// - the chunk to be removed is no longer referenced by an index file.
/// - the chunk to be removed has not been inserted by an active writer (atime newer than
/// writer start time).
--
2.47.3
More information about the pbs-devel
mailing list