[pbs-devel] [PATCH proxmox-backup 3/5] garbage collection: add structure for optimized image iteration

Christian Ebner c.ebner at proxmox.com
Fri Feb 21 15:01:08 CET 2025


Implements the GroupedImageList struct and methods, which groups
index files (image) paths by a hierarchy for optimized iteration
during phase 1 of garbage collection.

Currently, phase 1 of garbage collection iterates over all folders in
the datastore, without considering any logical organization. This is
to avoid missing image indices which might have unexpected paths,
thereby deleting chunks which are still in use by these indices in GC
phase 2.

The new structure helps to iterate over the index files in a more
logical way, without missing strange paths. The hierarchical
organization helps to avoid touching shared chunks of incremental
snapshot backups in a backup group multiple times, by allowing
tracking of these without excessive memory requirements.

Since deduplication happens on a per image basis for subsequent
snapshots, the hierarchy is chosen as follows:
- ns/group
- image filename
- snapshot timestamp

This allows to iterate over consecutive snapshots for the same images
in the same backup namespace and group.

Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
---
 pbs-datastore/src/datastore.rs | 63 ++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index eda78193d..520f54548 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1,4 +1,5 @@
 use std::collections::{HashMap, HashSet};
+use std::ffi::OsString;
 use std::io::{self, Write};
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::AsRawFd;
@@ -1573,3 +1574,65 @@ impl DataStore {
         Ok(())
     }
 }
+
+struct GroupedImageList {
+    groups: HashMap<String, HashMap<OsString, Vec<(i64, PathBuf)>>>,
+    strange_path_images: Vec<PathBuf>,
+}
+
+impl GroupedImageList {
+    fn new() -> Self {
+        Self {
+            groups: HashMap::new(),
+            strange_path_images: Vec::new(),
+        }
+    }
+
+    fn insert(&mut self, img: &Path, base_path: &Path) -> Result<(), Error> {
+        let img = img.to_path_buf();
+
+        if let Some(backup_dir_path) = img.parent() {
+            let backup_dir_path = backup_dir_path.strip_prefix(base_path)?;
+
+            if let Some(backup_dir_str) = backup_dir_path.to_str() {
+                if let Ok((namespace, backup_dir)) =
+                    pbs_api_types::parse_ns_and_snapshot(backup_dir_str)
+                {
+                    if let Some(filename) = img.file_name() {
+                        let filename = filename.to_os_string();
+                        let group_key = format!("{namespace}/{group}", group = backup_dir.group);
+
+                        if let Some(images) = self.groups.get_mut(&group_key) {
+                            if let Some(snapshots) = images.get_mut(&filename) {
+                                snapshots.push((backup_dir.time, img));
+                            } else {
+                                let snapshots = vec![(backup_dir.time, img)];
+                                images.insert(filename, snapshots);
+                            }
+                        } else {
+                            // ns/group not present, insert new
+                            let snapshots = vec![(backup_dir.time, img)];
+                            let mut images = HashMap::new();
+                            images.insert(filename, snapshots);
+                            self.groups.insert(group_key, images);
+                        }
+                        return Ok(());
+                    }
+                }
+            }
+        }
+
+        self.strange_path_images.push(img);
+        Ok(())
+    }
+
+    fn len(&self) -> usize {
+        let mut count = self.strange_path_images.len();
+        for (_group, images) in self.groups.iter() {
+            for (_image, snapshots) in images.iter() {
+                count += snapshots.len();
+            }
+        }
+        count
+    }
+}
-- 
2.39.5





More information about the pbs-devel mailing list