[pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots

Thu Apr 17 11:29:33 CEST 2025

On April 16, 2025 4:17 pm, Christian Ebner wrote:
> In an effort to simplify the GC phase 1 logic introduced by commit
> cb9814e3 ("garbage collection: fix rare race in chunk marking phase")
> this patch series implement a trash can functionality for snapshots.

that was fast ;)

> The main intention is to allow snapshot's index files, pruned while
> ongoing phase 1 of garbage collection, to be read and their chunks
> marked as in use as well. This will allow to get rid of the currently
> implemented and rather complex retry looping logic, which could in
> theory lead to failing GC or backups when trying to lock the whole
> group exclusively following the 10-th retry.

I think the other, not really smaller intention is to allow undoing an
accidental/premature deletion/pruning. So we need to consider this
usecase as well when designing the trash can semantics, and ideally
introduce that at the same time so we can properly rule out problems..

> To achieve this, pruning of snapshots does not remove them
> immediately, but rather moves them to a `.trash` subfolder in the
> datastores base directory. This directory will then be cleared before
> starting of GC phase 1, meaning that any index file could be restored
> until the next GC run.

see my comment on patch #3

> This however comes with it's own set of issues, therefore sending
> these patches as RFC for now. Open questions and known limitations
> are:
> - Pruning does not cleanup any space, on the contrary it might
>   require additional space on COW filesystem. Should there be a flag
>   to bypass the trash, also given that sometimes users truly want to
>   remove a snapshot immediately? Although that would re-introduce the
>   issue with new snapshot ceration and concurrent GC on a last
>   snapshot.

I think it might make sense, but I am not sure how we could avoid the GC
issue (but I think we could design the trash can feature in a way that
we keep the retry logic in GC, but that it only ever triggers in case
such a skip-trash pruning took place in a group).

> - Prune + sync + prune might lead to the same snapshot being pruned
>   multiple times, currently any second prune on a snapshot would
>   fail. Should this overwrite the trashed snapshot?

this depends on how the trash feature is implemented:
- if it's a mark on the snapshot, then attempting to write the snapshot
  again could either fail or overwrite the trashed snapshot
- if the snapshot is moved to a trash can, then we could keep multiple
  copies there

> - GC might now read the same index twice, once before it was pruned
>   followed by a prune while phase 1 is still ongoing and the second
>   time as read from the trash. Not really an issue, but rather a
>   limitation.

reading twice is a lot better than never reading ;) I don't think this
should be particularly problematic.

> - Further issues I'm currently overlooking
> 
> Christian Ebner (4):
>   datastore: always skip over base directory when listing index files
>   datastore: allow to specify sub-directory for index file listing
>   datastore: move snapshots to trash folder on destroy
>   garbage collection: read pruned snapshot index files from trash
> 
>  pbs-datastore/src/backup_info.rs |  14 ++-
>  pbs-datastore/src/datastore.rs   | 158 +++++++++++++++----------------
>  2 files changed, 89 insertions(+), 83 deletions(-)
> 
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
>