[pbs-devel] [PATCH proxmox-backup v3 1/3] fix #6195: api: datastore: add endpoint for moving namespaces

Mon Sep 15 11:19:10 CEST 2025

On 15.09.25 10:56, Christian Ebner wrote:
> On 9/15/25 10:27 AM, Hannes Laimer wrote:
>> On 15.09.25 10:15, Christian Ebner wrote:
>>> Thanks for having a go at this issue, I did not yet have an in depth 
>>> look at this but unfortunately I'm afraid the current implementation 
>>> approach will not work for the S3 backend (and might also have issues 
>>> for local datastores).
>>>
>>> Copying the S3 objects is not an atomic operation and  will take some 
>>> time, so leaves you open for race conditions. E.g. while you copy 
>>> contents, a new backup snapshot might be created in one of the 
>>> already copied backup groups, which will then however be deleted 
>>> afterwards. Same is true for pruning, and other metadata editing 
>>> operations such as
>>> adding notes, backup task logs, ecc.
>>>
>>
>> Yes, but not really. We lock the `active_operations` tracking file, so
>> no new read/write operations can be started after we start the moving
>> process. There's a short comment in the API endpoint function.
> 
> Ah yes, I did miss that part. But by doing that you will basically block 
> any datastore operation, not just the ones to the source or target 
> namespace. This is not ideal IMO. Further you cannot move a NS if any 
> other operation is ongoing on the datastore, which might be completely 
> unrelated to the source and target namespace, e.g. a backup to another 
> namespace?

Yes. But I don't think this is something we can (easily) check for, 
maybe there is a good way, but I can't think of a feasible one.
We could lock all affected groups in advance, but I'm not super sure we 
can just move a locked dir, at least with the old locking.

Given both for local and S3 datastores this is I'd argue a rather fast
operations, so just saying 'nobody does anything while we move stuff' is
reasonable.

What we could think about adding is maybe a checkbox for update jobs
referencing the NS, but not sure about if we want that.

> 
>> I'm not sure there is much value in more granular locking, I mean, is
>> half a successful move worth much? Unless we add some kind of rollback,
>> but tbh, I feel like that would not be worth the effort I think.
> 
> Well, it could be just like we do for the sync jobs, skipping the move 
> for the ones where the backup group could not be locked or fails for 
> some other reason?
> 

Hmm, but then we'd have it in two places, and moving again later won't
work because we can't distinguish between a same named ns already
existing and a new try to complete an earlier move. And we also can't
allow that in general, cause what happens if there's the same VMID
twice.

> I think having a more granular backup group unit instead of namespace 
> makes this more flexible: what if I only want to move one backup group 
> from one namespace to another one, as the initial request in the bug 
> report?
> 

That is not possible currently. And, at least with this series, not
intended. We could support that eventually, but that should be rather
orthogonal to this one I think.

> For example, I had a VM which has been backed up to a given namespace, 
> has however since been destroyed, but I want to keep the backups by 
> moving the group with all the snapshots to a different namespace, 
> freeing the backup type and ID for the current namespace?
> 

I see the use-case for this, but I think these are two things. Moving a 
NS and moving a single group.

>>
>>> So IMO this must be tackled on a group level, making sure to get an 
>>> exclusive lock for each group (on the source as well as target of the 
>>> move operation) before doing any manipulation. Only then it is okay 
>>> to do any non-atomic operations.
>>>
>>> The moving of the namespace must then be implemented as batch 
>>> operations on the groups and sub-namespaces.
>>>
>>> This should be handled the same also for regular datastores, to avoid 
>>> any races there to.
>>
>