[pbs-devel] [RFC proxmox-backup 0/4] fix #5799: Gather per-namespace/group/snapshot storage usage stats
Christian Ebner
c.ebner at proxmox.com
Mon Jan 19 14:27:03 CET 2026
Disclaimer: These patches are still in a development state and send
as RFC to discuss implementation details especially with respect to
acceptable required memory footprint and performance limitations.
As is, yhey are not intended for production use yet.
Issue #5799 requested to gather and cache information about raw
storage size of uniquely referenced chunks and deduplication factor
for backup groups, with the intent to provide better introspection
for storage optimization by allowing to pruning specific backup
groups/snapshot based on this additional information.
This patches draft an approach to generate such statistics during
garbage collection, by collecting chunk to namespace/group/snapshot
relations an providing an in-memory reverse mapping from chunk
digests to namespaces/groups/snapshots referencing given chunk. This
reverse mapping would further allow to e.g. mark snapshots as invalid
if referenced chunks are missing.
During phase 1, snapshots referencing chunk digests are stored in a
lookup table. The actual namespace, group and snapshot data is stored
in dedicated indexes, only referenced by the respective key in the
lookup table, with the intent to keep the slot size predictable and
small for better allocation.
During phase 2 raw chunk size is collected while iterating over chunk
files.
Finally, the statistics are gathered by accumulating the counts of
each chunk digest for each of the namespace/group/snapshot, taking
advantage of the lookup map.
Currently, the information is being gathered unconditionally and
logged to the garbage collection task log, but it is planned to make
this opt-in and store gathered data on the namespace/group/snapshot
level, to e.g. be shown on the datastore content listings or a
dedicated content listing.
The following differences in RSS max values were observed via
`watch -n 1 "ps -p $(pidof proxmox-backup-proxy) -o rss | tail -n 1 | tee -a ps-rss.out"`
and compared to initial RSS values after service restart (and GC LRU
cache disabled by setting to 0) on 2 datastores:
| Delta RSS (MiB) | index files | chunk count | deduplication factor |
----------------------------------------------------------------------
| 412.355 | 1125 | 982236 | 14.69 |
| 168.414 | 213 | 598312 | 5.93 |
----------------------------------------------------------------------
Open questions and ideas to discuss:
- Is the observed memory requirement acceptable if provided as opt-in
feature? Are there other ideas to further reduce the memory
footprint? I was pondering about a indirection mapping to group
digests by common prefix and only store individual suffixes, which
however only scales better when there is no need to store this as
hashmap, so not really suitable due to diminished lookup performance.
- Conditionally replace the GC LRU cache by the lookup map if this
feature is enabled. The digests need to be stored anyways, so it
would make sense to use it to avoid multiple chunk atime updates
instead.
- Add a dedicated tab to show the contents independent from the
current datastore contents? This would reduce the risk of
misinterpretation as this is no real-time data.
- Add this as dedicated task instead of combining it with garbage
collection? This would allow to perform information gathering on
specific sub-namespaces, groups or selected snapshots only.
Link to the bugtracker issue:
https://bugzilla.proxmox.com/show_bug.cgi?id=5799
proxmox-backup:
Christian Ebner (4):
chunk store: restrict chunk sweep helper method to module parent
datastore: add namespace/group/snapshot indices for reverse lookups
datastore: introduce reverse chunk digest lookup table
fix #5799: GC: track chunk digests and accumulate statistics
pbs-datastore/src/chunk_store.rs | 11 +-
pbs-datastore/src/datastore.rs | 46 +++-
pbs-datastore/src/lib.rs | 1 +
pbs-datastore/src/reverse_digest_map.rs | 349 ++++++++++++++++++++++++
4 files changed, 404 insertions(+), 3 deletions(-)
create mode 100644 pbs-datastore/src/reverse_digest_map.rs
Summary over all repositories:
4 files changed, 404 insertions(+), 3 deletions(-)
--
Generated by git-murpp 0.8.1
More information about the pbs-devel
mailing list