[pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates

Christian Ebner c.ebner at proxmox.com
Wed Mar 26 11:03:33 CET 2025


To reduce the number of atimes updates, keep track of the recently
marked chunks in phase 1 of garbage to avoid multiple atime updates
via expensive utimensat() calls.

Recently touched chunks are tracked by storing the chunk digests in
an LRU cache of fixed capacity. By inserting a digest, the chunk will
be the most recently touched one and if already present in the cache
before insert, the atime update can be skipped. The cache capacity of
1024 * 1024 was chosen as compromise between required memory usage
and the size of an index file referencing a 4 TiB fixed size chunked
image (with 4MiB chunk size).

The previous change to iterate over the datastore contents using the
datastore's iterator helps for increased cache hits, as subsequent
snapshots are most likely to share common chunks.

Basic benchmarking:

Number of utimensat calls shows significatn reduction:
unpatched: 31591944
patched:    1495136

Total GC runtime shows significatn reduction (average of 3 runs):
unpatched: 155.4 ± 3.5 s
patched:    22.8 ± 0.5 s

VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, average of 3 runs):
unpatched before: 1196028 ± 0 kB
unpatched after:  1196028 ± 0 kB

unpatched before: 1163337 ± 28317 kB
unpatched after:  1330906 ± 29280 kB
delta:             167569 kB

Dependence on the cache capacity:
     capacity runtime[s]  VmPeakDiff[kB]
       1*1024     66.221               0
      10*1024     36.164               0
     100*1024     23.141               0
    1024*1024     22.188          101060
 10*1024*1024     23.178          689660
100*1024*1024     25.135         5507292

Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special

Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54

Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB

Distribution of snapshots (binned by month):
2023-11	11
2023-12	16
2024-01	30
2024-02	38
2024-03	17
2024-04	37
2024-05	17
2024-06	59
2024-07	99
2024-08	96
2024-09	115
2024-10	35
2024-11	42
2024-12	37
2025-01	162
2025-02	489
2025-03	1884

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
---
changes since version 4:
- extended commit message
- s/recently_touched_chunks/chunk_lru_cache/
- dropped to generic comment

 pbs-datastore/src/datastore.rs | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 8ce98f1b3..9cb78d741 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -7,6 +7,7 @@ use std::sync::{Arc, LazyLock, Mutex};
 
 use anyhow::{bail, format_err, Context, Error};
 use nix::unistd::{unlinkat, UnlinkatFlags};
+use pbs_tools::lru_cache::LruCache;
 use tracing::{info, warn};
 
 use proxmox_human_byte::HumanByte;
@@ -1076,6 +1077,7 @@ impl DataStore {
         &self,
         index: Box<dyn IndexFile>,
         file_name: &Path, // only used for error reporting
+        chunk_lru_cache: &mut LruCache<[u8; 32], ()>,
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
     ) -> Result<(), Error> {
@@ -1086,6 +1088,12 @@ impl DataStore {
             worker.check_abort()?;
             worker.fail_on_shutdown()?;
             let digest = index.index_digest(pos).unwrap();
+
+            // Avoid multiple expensive atime updates by utimensat
+            if chunk_lru_cache.insert(*digest, ()) {
+                continue;
+            }
+
             if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
                 let hex = hex::encode(digest);
                 warn!(
@@ -1126,6 +1134,7 @@ impl DataStore {
         let mut unprocessed_index_list = self.list_index_files()?;
         let index_count = unprocessed_index_list.len();
 
+        let mut chunk_lru_cache = LruCache::new(1024 * 1024);
         let mut processed_index_files = 0;
         let mut last_percentage: usize = 0;
 
@@ -1152,7 +1161,13 @@ impl DataStore {
                             Some(index) => index,
                             None => continue,
                         };
-                        self.index_mark_used_chunks(index, &path, status, worker)?;
+                        self.index_mark_used_chunks(
+                            index,
+                            &path,
+                            &mut chunk_lru_cache,
+                            status,
+                            worker,
+                        )?;
 
                         unprocessed_index_list.remove(&path);
 
@@ -1179,7 +1194,7 @@ impl DataStore {
                 Some(index) => index,
                 None => continue,
             };
-            self.index_mark_used_chunks(index, &path, status, worker)?;
+            self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
             warn!("Marked chunks for unexpected index file at '{path:?}'");
         }
 
-- 
2.39.5





More information about the pbs-devel mailing list