[pbs-devel] [PATCH v2 proxmox-backup] garbage collection: fix rare race in chunk marking phase

Thomas Lamprecht t.lamprecht at proxmox.com
Tue Apr 15 17:40:14 CEST 2025


On 15/04/2025 15:14, Fabian Grünbichler wrote:
>>> this should check the result? this would also fail if a backup is
>>> currently going on (very likely if we end up here?) and abort the GC
>>> then, but we don't have a way to lock a group with a timeout at the
>>> moment.. but maybe we can wait and see if users actually run into that,
>>> we can always extend the locking interface then..
>> True, but since this is very unlikely to happen, I would opt to fail and 
>> add an error context here so this can easily be traced back to this code 
>> path.
> yes, for now I'd say aborting GC with a clear error here is best. we
> cannot safely continue..

Did not check v3, but note that users often do not run GC with a high
frequency due to the load it generates and time it needs, but still
depend on it to finish so space is being freed.

So if there is any way we can go or add to avoid aborting completely,
it would be IMO quite worth to evaluate doing that more closely.

FWIW, an completely different alternative might be to not change
GC but pruning when a GC job runs, e.g. (spitballing/hand waving)
move the index to a trash folder and notify GC about that.




More information about the pbs-devel mailing list