[pbs-devel] [RFC proxmox-backup 00/24] fix #3044: push datastore to remote

Christian Ebner c.ebner at proxmox.com
Thu Jul 18 09:36:45 CEST 2024


On 7/17/24 17:48, Thomas Lamprecht wrote:
> Am 15/07/2024 um 12:15 schrieb Christian Ebner:
>> While being mostly implemented, there are still some implementation
>> details to be clarified, therefore requesting comments on the current
>> state of the patch series.
>>
>> This patch series implements the functionality to extend the current
>> sync jobs in pull direction by an additional push direction, allowing
>> to push contents of a local source datastore to a remote target.
> 
> nice!
>   
>> The series implements this by using the REST API of the remote target
>> for fetching, creating and/or deleting namespaces, groups and backups,
>> and reuses the clients backup writer functionality to create snapshots
>> by writing a manifeset on the remote target and sync the fixed index,
>> dynamic index or blobs contained in the source manifest to the remote,
>> preserving also encryption information.
>>
>> The patch series is structured as follows:
>> - patches 1 to 5 are cleanup patches
>> - patches 6 to 11 are patches restructuring the current code so that
>>    functionality of the current pull implementation can be reused for
>>    the push implementation as well
>> - patches 12 and 13 extend the backup writers functionality to be able
>>    to push snapshots to the target
>> - patches 14 to 16 are once again preparatory patches for shared
>>    implementation of sync jobs in pull and push direction
>> - patch 17 defines the required permission acls and roles
>> - patch 18 implements almost all of the logic required for the push,
>>    including pushing of the datastore, namespace, groups and snapshots,
>>    taking into account also filters and additional sync flags
>> - patch 19 extends the current sync job configuration by a flag
>>    allowing to set the direction the sync job should operate, defaulting
>>    to pull.
>> - patches 20 to 24 finally expose the new sync job direction via the
>>    API, CLI and WebUI.
>>
>> While most of the functionality is already in place, some open
>> questions remain:
>> - Remove vanished stats would require to expose additional information
>>    to the REST API endpoints for deletion of namespaces and groups
> 
> might be fine to add, but that could be also done later.

Okay, than I will see how to add this as well, but make sure to keep 
these patches independent so this can be also left out until later. 
After all this would break the APIs return values.

> 
>> - Performance for push: The current implementation only allows to
>>    re-upload known chunks which have been read from the local datastore,
>>    there is no download of previous snapshots to optimize this.
> 
> not 100% sure what you mean here. Is it that you always send all chunks
> of affected backup snapshots as you cannot know which chunks are already
> in the pool on the target side >
> And with "no download of previous snapshots to optimize this" you mean
> that you do not get the index of the previously synced snapshot to check
> that for which chunks you can skip for now, but that could be done in
> the future?


Sorry if I created confusion here, this is more of an open point rather 
than a question (also my wording was not fully correct, should have been 
re-index, not re-upload). But your understanding is correct.

Let me try to explain in a bit more detail: in the current 
implementation, the sync job in push direction keeps track of all the 
chunks uploaded by the backup writer upload streams for the various 
snapshots. This is to avoid double upload of chunks, while still 
re-indexing the chunks in the corresponding index files (however, here 
still lingers the bug reported by Gabriel).

The next optimization would be to also fetch the previous snapshot for 
each backup group already present on the sync target, and use its index 
file to to avoid upload of these chunks already known to the targets 
datastore as well, only re-indexing them for the snapshot to be pushed.

This implementation is currently still lacking, my intention is however 
to add this as well.

> 
>> - Permissions and roles: Currently, a dedicated role is implemented for
>>    a push operator. It remains to be clarified if the given permissions
>>    are to open, or if additional subsets of permissions might be
>>    warranted, e.g. to allow for removed vanished.
> 
> I mean, in the end the remote can already reduce the privileges to lock
> usage to specific NS or forbid creating backups completely, so I think
> that we'd be fine w.r.t. not being to open. That might be something to
> mention in the docs though, as users might wonder why they get a
> permission error when a sync job runs even though they had enough rights
> to create it.
> 
> Besides that, while I did not think this through all to closely, the new
> privileges seem OK to have.
> They allow admins to give users access to a remote that uses rather
> powerful credentials while still controlling roughly what a user can do
> with it. While in untrusted environments one probably wants to avoid
> that situation, in (semi-)trusted environments this can be nice to avoid
> error potential of some automation while not requiring an admin to
> configure many remotes for the same PBS, each using separate credentials
> with a minimal set of privs granted.

Yes, true. In the end the privs on the target are these who matter and 
win, so constrains configured on the source are less critical. Will also 
add a draft section to the documentation in the next version of the series.

> 
>>
>> Christian Ebner (24):
>>    datastore: data blob: fix typos in comments
>>    server: pull: be more specific in module comment
>>    server: pull: silence clippy to many arguments warning
>>    www: sync edit: indetation style fix
>>    server: pull: fix sync info message for root namespace
> 
> applied the above 5 clean-up commits already, thanks!

Nice, thanks!

> 
> btw. I agree with Gabriel w.r.t. having this separated a bit more
> explicitly in the user interface; while for the implementation it might
> not be that different, there's a big difference for the user. Making an
> error and getting source and target swapped by mistake might lead to
> some problematic results, like pruning some important snapshots as the
> (wrongly) chosen source is empty.

Yes, both of your arguments convinced me that clearly separating the UI 
makes more sense.

> 
> So I'd not only show the jobs in separated grids (can be the same
> panel though) but also use different add, edit, and remove dialogues and
> buttons.

Okay, will go that route then.

> 
> It might even make sense to evaluate using a different sync job section
> type, and thus config, for these; not saying that's a must, but it myabe
> could additionally help to avoid mistakes.

Will have a look at what this would imply as well.

Thanks a lot for the feedback!




More information about the pbs-devel mailing list