[pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates

Wed Apr 2 21:50:57 CEST 2025

On 4/2/25 17:57, Thomas Lamprecht wrote:
> Am 26.03.25 um 11:03 schrieb Christian Ebner:
>> Basic benchmarking:
>>
>> Number of utimensat calls shows significatn reduction:
>> unpatched: 31591944
>> patched:    1495136
>>
>> Total GC runtime shows significatn reduction (average of 3 runs):
>> unpatched: 155.4 ± 3.5 s
>> patched:    22.8 ± 0.5 s
> 
> Thanks a lot for providing these numbers, and what a nice runtime
> improvement!
> 
>>
>> VmPeak measured via /proc/self/status before and after
>> `mark_used_chunks` (proxmox-backup-proxy was restarted in between
>> for normalization, average of 3 runs):
>> unpatched before: 1196028 ± 0 kB
>> unpatched after:  1196028 ± 0 kB
>>
>> unpatched before: 1163337 ± 28317 kB
>> unpatched after:  1330906 ± 29280 kB
>> delta:             167569 kB
> 
> VmPeak is virtual memory though, not something like resident set size,
> or better proportional set size – but yeah that's harder to get.
> Simplest way might be polling something like `ps -o pid,rss,pss -u backup`
> in a shell alongside the GC run a few times per second, e.g.:
> 
> while :; do printf '%s ' $(date '+%T.%3N'); $(); ps -o pid,rss,pss -u backup --no-headers; sleep 0.5; done | tee gc-stats
> 
> And then get the highest PSS values via:
> 
> sort -nk4,4 gc-stats | tail
> 
> I do not think this needs to be redone, and a new revision needs to be
> send though. But, it might be nice to do a quick test just for a rough
> comparison to VmPeak delta.

Ah, thanks for the explanation and suggestion, was a bit unsure already 
if the VmPeak is informative enough. Will re-check this with the 
suggested metrics.

> 
>>
>> Dependence on the cache capacity:
>>       capacity runtime[s]  VmPeakDiff[kB]
>>         1*1024     66.221               0
>>        10*1024     36.164               0
>>       100*1024     23.141               0
>>      1024*1024     22.188          101060
> 
> Hmm, seems like we could lower the cache size to something like 128*1024
> or 256*1024 and get already most benefits for this workload.
> 
> What do you think about applying this as is and after doing a quick RSS
> and/or PSS benchmark decide if it's worth to start out a bit smaller, as
> 167 MiB delta is a bit much for my taste if a quarter of that is enough
> to get most benefits. If the actual used memory (not just virtual memory
> mappings) is rather closer to the cache size without overhead (32 MiB),
> I'd be fine with keeping this as is.

Okay, yes will ask Stoiko to get access to the PBS instance once more to 
have a similar datastore and gain some more data on this.

> 
> tuning option in MiB (i.e. 32 MiB / 32 B == 1024*1024 capacity) where the
> admin can better control this themselves.

This I do not fully understand, I assume this sentence is cut off? But I 
can send a patch to expose this as datastore tuning option as well.

> 
>>   10*1024*1024     23.178          689660
>> 100*1024*1024     25.135         5507292
>