[pbs-devel] [PATCH proxmox-backup 3/5] garbage collection: add structure for optimized image iteration

Fri Mar 7 09:59:33 CET 2025

On 3/7/25 09:53, Fabian Grünbichler wrote:
> 
>> Christian Ebner <c.ebner at proxmox.com> hat am 07.03.2025 09:24 CET geschrieben:
>>
>>   
>> On 3/5/25 14:47, Fabian Grünbichler wrote:
>>> On February 21, 2025 3:01 pm, Christian Ebner wrote:
>>>> Implements the GroupedImageList struct and methods, which groups
>>>> index files (image) paths by a hierarchy for optimized iteration
>>>> during phase 1 of garbage collection.
>>>>
>>>> Currently, phase 1 of garbage collection iterates over all folders in
>>>> the datastore, without considering any logical organization. This is
>>>> to avoid missing image indices which might have unexpected paths,
>>>> thereby deleting chunks which are still in use by these indices in GC
>>>> phase 2.
>>>>
>>>> The new structure helps to iterate over the index files in a more
>>>> logical way, without missing strange paths. The hierarchical
>>>> organization helps to avoid touching shared chunks of incremental
>>>> snapshot backups in a backup group multiple times, by allowing
>>>> tracking of these without excessive memory requirements.
>>>>
>>>> Since deduplication happens on a per image basis for subsequent
>>>> snapshots, the hierarchy is chosen as follows:
>>>> - ns/group
>>>> - image filename
>>>> - snapshot timestamp
>>>>
>>>> This allows to iterate over consecutive snapshots for the same images
>>>> in the same backup namespace and group.
>>>>
>>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>>> ---
>>>>    pbs-datastore/src/datastore.rs | 63 ++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 63 insertions(+)
>>>>
>>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>>> index eda78193d..520f54548 100644
>>>> --- a/pbs-datastore/src/datastore.rs
>>>> +++ b/pbs-datastore/src/datastore.rs
>>>> @@ -1,4 +1,5 @@
>>>>    use std::collections::{HashMap, HashSet};
>>>> +use std::ffi::OsString;
>>>>    use std::io::{self, Write};
>>>>    use std::os::unix::ffi::OsStrExt;
>>>>    use std::os::unix::io::AsRawFd;
>>>> @@ -1573,3 +1574,65 @@ impl DataStore {
>>>>            Ok(())
>>>>        }
>>>>    }
>>>> +
>>>> +struct GroupedImageList {
>>>> +    groups: HashMap<String, HashMap<OsString, Vec<(i64, PathBuf)>>>,
>>>
>>> this seems very complicated, couldn't we make it a lot simpler by doing
>>>
>>> key: NS + Group (as tuple, using the actual types)
>>> value: (snapshot => index paths)
>>>
>>> or even a nested mapping of
>>>
>>> NS -> Group -> Snapshot -> Images
>>>
>>> and then simply reset when proceeding from one snapshot to the next? the
>>> scope of "in-memory chunk digests" would still be bounded (in a way that
>>> is in line with how we do it in many other parts, like when doing backup
>>> + restore), and the structure of this helper entity and the way it is
>>> iterated over would feel much more natural.
>>
>> The type signature looks more complex than what it is in the end, but I
>> do agree that this can be reduced/optimized.
> 
> it's not so much the type itself, but the elements (too much String-like
> types where we could have proper ones) and the way of grouping things ;)
> 
> in the end, it is either a nested map or one with a complex key, no way
> around it (well, other than the alternate approach below :))
> 
>>>
>>> we could go one step further, but that might eat some of our performance
>>> gains here
>>>
>>> - list all images like we did before and store the result in a
>>>     collection that allows fast removal
>>> - iterate normally over the datastore in a structured fashion using the
>>>     existing helpers
>>> -- when proceeding from one snapshot to the next, use the new "don't
>>> retouch chunks" logic
>>> -- remove each visited image path from the list of images
>>> - if any images are left at the end in that list, mark those manually
>>>     (strange or vanished paths, should hopefully be empty or irrelevant)
>>>
>>> the main downside is that we'd have to iterate twice (well, not quite
>>> twice, since the hierarchical iterators skip parts outside of the
>>> "known" hierarchy), but we would save all this custom parse-back logic
>>> here. if we keep the parse-back logic, then I think we want to have a
>>> logical structure as well that follows how we normally do things.
>>>
>>> I think that the list_images part of GC is by far the least expensive
>>> though (for my test datastore where the GC runtime goes down from 52 to
>>> 17s with your patch series, listing images is 12ms of that ;)), so
>>> effectively doing it twice might not hurt that much in practice..
>>
>> I do like this approach more, as this will also detect snapshots as
>> "strange paths" when the helpers themself do fail to detect an index
>> file for some reason, not just the snapshot being off.
>>
>> So I will reiterate the patches using this approach, thanks!
> 
> perf numbers for bigger datastores and the usual pathological candidates
> (NFS/Samba, single spinning rust) would be nice in that case - I guess
> that with iterating twice we should still end up being faster, as the
> reduced chunk accesses should by far outweigh the metadata accesses for
> listing (unless you only have a single snapshot per group I guess), but
> hard numbers are better than gut feelings :) both duration and number of
> file operations might be interesting (maybe for stock, this version and
> the next one?)

Acked, will also look into how to get more representative performance 
statistics for this as well (maybe by setting up a datastore in tmpfs 
and using perf counters), as I noticed that just looking a the runtime 
for this on a regular datastore is not really telling. There are to much 
moving parts involved, leading to a huge spread in the observables.