[pbs-devel] [PATCH v4 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates
Thomas Lamprecht
t.lamprecht at proxmox.com
Tue Mar 25 12:56:42 CET 2025
Am 21.03.25 um 10:32 schrieb Christian Ebner:
> To reduce the number of atimes updates, keep track of the recently
> marked chunks in phase 1 of garbage to avoid multiple atime updates
> via expensive utimensat() calls.
>
> Recently touched chunks are tracked by storing the chunk digests in
> an LRU cache of fixed capacity. By inserting a digest, the chunk will
> be the most recently touched one and if already present in the cache
> before insert, the atime update can be skipped.
Code-wise this looks alright to me, albeit I did not look at it in-depth,
but what I'd be interested is documenting some more thoughts about how
the size of the cache was chosen; even if it was mostly random then stating
so can help a lot when rethinking this in the future, as then one doesn't
have to guess if there was some more reasoning behind that.
Also some basic benchmarks might be great, even if from some random grown
setup, as long as one describes it, like the overall pool data usage,
deduplication factor, amount of backup groups, amount of snapshots and
their rough age (distribution) and basic system characteristics like the
cpu and basic parameters of the underlying storage, like filesystem type
and (block) device type that backs it, as with that one can classify the
change somewhat good enough.
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
> changes since version 3:
> - no changes
>
> pbs-datastore/src/datastore.rs | 26 ++++++++++++++++++++++++--
> 1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index ea7e5e9f3..4445944c0 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
...
> @@ -1128,6 +1136,8 @@ impl DataStore {
> let mut unprocessed_index_list = self.list_index_files()?;
> let index_count = unprocessed_index_list.len();
>
> + // Allow up to 32 MiB, as only storing the 32 digest as key
Above comment is IMO a bit hard to parse and does not really provide any
reasoning about the chosen size FWICT.
> + let mut recently_touched_chunks = LruCache::new(1024 * 1024);
It's quite a descriptive and good name, but something slightly shorter
like `chunk_lru_cache` would be IMO fine here too, but really no hard
feelings.
More information about the pbs-devel
mailing list