[pbs-devel] [PATCH proxmox-backup 0/5] GC: avoid multiple atime updates

Roland devzero at web.de
Fri Feb 21 16:35:46 CET 2025


hello,

this looks like this relates to or adresses what's being mentioned here ?

https://forum.proxmox.com/threads/better-understand-proxmox-garbage-collection-lots-of-avoidable-utimensat-calls.101417/
?

it came to my mind when reading this.  not sure if i did an RFE for
this, guess i forgot...

 >for my curiosity, i see there is one chunk in both of my homeserver
repos which is accessed orders of magnitudes more often then all
 >the other chunks (for one repo roundabout 1000x more often) during gc
run. anyhow, many chunks are being acessed repeatedly, but at a much
lower rate.

ah, now i understand why that file is touched 34742 times - it's that
"all zeroes" chunk, which does occur in the vm image much more often.

34742 proxmox-backup-(1428): R
/backup/pve-t620_backup/.chunks/bb9f/bb9f8df61474d25e71fa00722318cd387396ca1736605e1248821cc0de3d3af8

any clue what's this one,  which is being touched an order of magnitude
more often then the zeroes chunk ?

940079 proxmox-backup-(1428): R
/backup/pve-maker-bonn_backup/.chunks/7b27/7b27b1eb4febae3273321255d8304e9b3e7938d9e254564bef859a4307a88638

regards
roland


Am 21.02.25 um 15:01 schrieb Christian Ebner:
> This patches implement the logic to greatly improve the performance
> of phase 1 garbage collection by avoiding multiple atime updates on
> the same chunk.
>
> Currently, phase 1 GC iterates over all folders in the datastore
> looking and collecting all image index files without taking any
> logical assumptions (e.g. namespaces, groups, snapshots, ...). This
> is to avoid accidentally missing image index files located in
> unexpected paths and therefore not marking their chunks as in use,
> leading to potential data losses.
>
> This patches improve phase 1 by inserting encountered index image
> paths into a data structure which allows to iterate the index files
> in a more logical manner, following the same principle as for
> incremental backup snapshots. The index files for the same namespace
> and group as well as image filename can therefore be consecutevly
> inspected.
>
> Further, by keeping track of already seen and therefore updated chunk
> atimes, it is now avoided to update the atime over and over again on the
> chunks shared by consecutive backup snaphshots.
>
> To give some ballpark figures, this reduced phase 1 garbage collection
> on a real world datastore containing some of my backups from around
> 2 minutes to about 16 seconds.
>
> Christian Ebner (5):
>    datastore: restrict datastores list_images method scope to module
>    garbage collection: refactor archive type based chunk marking logic
>    garbage collection: add structure for optimized image iteration
>    garbage collection: allow to keep track of already touched chunks
>    fix #5331: garbage collection: avoid multiple chunk atime updates
>
>   pbs-datastore/src/datastore.rs | 204 ++++++++++++++++++++++++++-------
>   1 file changed, 160 insertions(+), 44 deletions(-)
>



More information about the pbs-devel mailing list