[pbs-devel] [RFC v2 proxmox-backup 13/21] datastore: recreate trashed backup groups if requested

Mon May 12 12:02:54 CEST 2025

On May 12, 2025 10:05 am, Christian Ebner wrote:
> On 5/9/25 14:27, Fabian Grünbichler wrote:
>> On May 8, 2025 3:05 pm, Christian Ebner wrote:
>>> A whole backup group might have been marked as trashed, including all
>>> of the contained snapshots.
>>>
>>> Since a new backup to that group (even as different user/owner)
>>> should still work, permanently clear the whole trashed group before
>>> recreation. This will limit the trash lifetime as now the group is
>>> not recoverable until next garbage collection.
>> 
>> IMHO this is phrased in a way that makes it hard to parse, and in any
>> case, such things should go into the docs..
> 
> Acked, will add a section to the docs for handling and implications of 
> trashed items.
> 
>> 
>>>
>>> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
>>> ---
>>>   pbs-datastore/src/datastore.rs | 26 ++++++++++++++++++++++++++
>>>   1 file changed, 26 insertions(+)
>>>
>>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>>> index 4f7766c36..ca05e1bea 100644
>>> --- a/pbs-datastore/src/datastore.rs
>>> +++ b/pbs-datastore/src/datastore.rs
>>> @@ -934,6 +934,32 @@ impl DataStore {
>>>                   let guard = backup_group.lock().with_context(|| {
>>>                       format!("while creating locked backup group '{backup_group:?}'")
>>>                   })?;
>>> +                if backup_group.is_trashed() {
>>> +                    info!("clear trashed backup group {full_path:?}");
>> 
>> I think we should only do this if the new and old owner are not
>> identical..
> 
> Hmm, not sure if that would not introduce other possible issues/confusions?
> E.g. a PVE host creates snapshots for a VM/CT with given ID to the 
> corresponding backup group. The group get's pruned as not required 
> anymore, the VM/CT destroyed. A new VM/CT is created on the PVE host and 
> backups created to the (trashed) group...

I think the downside of too aggressively clearing trashed snapshot
(which might still be valuable) is far bigger than the downside of this
potential footgun. especially if the gist of how trashing works is
"trash will be cleared on next GC run", then "trash group; scheduled
backup runs" clearing all trashed snapshots would be potentially
disastrous - e.g., if I don't have GC configured at the moment and use
"trash group" as a sort of bulk action before recovering a few
individual snapshots..

if I give a user access to a VMID on the PVE side, then I need to ensure
any traces of old usage of that VMID is gone if I don't want that user
to see those traces. this doesn't change with the trash feature at all
(there's also things like old task logs, RRD data etc that are hard to
clear, so you *should never reuse a VMID between users* anyway).

as long as the owner stays the same, a user with access that allows
creating a new snapshot could have already recovered and read all those
trashed snapshot before creating a new snapshot, so nothing is leaked
that is not already accessible..

I am not even 100% sure if clearing the trash on owner change is
sensible/expected behaviour (the alternative being to block new
snapshots until the trash is cleared).

>> 
>>> +                    let dir_entries = std::fs::read_dir(&full_path).context(
>>> +                        "failed to read directory contents during cleanup of trashed group",
>>> +                    )?;
>>> +                    for entry in dir_entries {
>>> +                        let entry = entry.context(
>>> +                            "failed to read directory entry during clenup of trashed group",
>>> +                        )?;
>>> +                        let file_type = entry.file_type().context(
>>> +                            "failed to get entry file type during clenup of trashed group",
>>> +                        )?;
>>> +                        if file_type.is_dir() {
>>> +                            std::fs::remove_dir_all(entry.path())
>>> +                                .context("failed to remove directory entry during clenup of trashed snapshot")?;
>>> +                        } else {
>>> +                            std::fs::remove_file(entry.path())
>>> +                                .context("failed to remove directory entry during clenup of trashed snapshot")?;
>>> +                        }
>>> +                    }
>>> +                    self.set_owner(ns, backup_group.group(), auth_id, false)?;
>>> +                    let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
>> 
>> sure about that? we are holding a lock here, nobody is allowed to change
>> the owner but us..
> 
> Not really, opted for staying on the safe side here, because the 
> per-exsiting code does it as well, but without mentioning why exactly.

AFAICT that pre-dates locking of groups or snapshots, I think it doesn't
make sense there either since the introduction of those - all calls to
set_owner are guarded by the group lock now..

>>> +                    self.untrash_namespace(ns)?;
>>> +                    return Ok((owner, guard));
>>> +                }
>>> +
>>>                   let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
>>>                   Ok((owner, guard))
>>>               }
>>> -- 
>>> 2.39.5
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel at lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>>
>> 
>> 
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>> 
>> 
> 
>