[pve-devel] [RFC storage 10/23] plugin: introduce new_backup_provider() method

Fri Jul 26 14:02:26 CEST 2024

On Fri Jul 26, 2024 at 11:52 AM CEST, Fiona Ebner wrote:
> Am 25.07.24 um 17:32 schrieb Max Carrara:
> > On Thu Jul 25, 2024 at 3:11 PM CEST, Fiona Ebner wrote:
> >> Am 25.07.24 um 11:48 schrieb Max Carrara:
> >>>     The same goes for backup provider plugins - IMO namespacing them
> >>>     like e.g. `PVE::Backup::Provider::Plugin::Foo` where `Foo` is a
> >>>     (concrete) plugin.
> >>>
> >>
> >> The BackupProvider namespace is already intended for the plugins, adding
> >> an extra level with "Plugin" would just bloat the module names,
> >> especially if we decide to go the same route as for storage plugins and
> >> have a "Custom" sub-namespace.
> > 
> > I understand what you mean, yeah. Would perhaps something like
> > `PVE::BackupProvider::Plugin::*` be better?
> > 
> > The reason why I'm suggesting this is because in `PVE::Storage::*`,
> > every plugin lives alongside `Plugin.pm`, even though the extra
> > directory wouldn't really hurt IMO. E.g. `PVE::Storage::DirPlugin` would
> > then be `PVE::Storage::Plugin::Dir`.
> > 
>
> I think it's fine to live alongside the base plugin (I'd prefer
> Plugin::Base if going for a dedicated directory). I agree, if we ever
> want to add something other than plugins to the top namespace, it is
> much nicer to have the dedicated directory. And it is made more explicit
> that things in there are plugins (and not having to name each one
> FooPlugin). However, I still feel like
> PVE::BackupProvider::Plugin::Custom::Bar is rather lengthy (should we go
> with the Custom directory again), but I'm not really opposed to doing it
> like this.
>
> >>
> >>> The above two methods - `backup_nbd` and `backup_directory` - is there
> >>> perhaps a way to merge them? I'm not sure if what I'm having in mind
> >>> here is actually feasible, but what I mean is "making the method
> >>> agnostic to the type of backup". As in, perhaps pass a hash that
> >>> contains a `type` key for the type of backup being made, and instead of
> >>> having long method signatures, include the remaining parameters as the
> >>> remaining keys. For example:
> >>>
> >>> {
> >>>     'type' => 'lxc-dir',  # type names are just examples here
> >>>     'directory' => '/foo/bar/baz',
> >>>     'bandwidth_limit' => 42,
> >>>     ...
> >>> }
> >>>
> >>> {
> >>>     'type' => 'vm-nbd',
> >>>     'device_name' => '...',
> >>>     'nbd_path' => '...',
> >>>     ...
> >>> }
> >>>
> >>> You get the point :P
> >>>
> >>> IMO it would make it easier to extend later, and also make it more
> >>> straightforward to introduce new parameters / deprecate old ones, while
> >>> the method signature stays stable otherwise.
> >>>
> >>> The same goes for the different cleanup methods further down below;
> >>> instead of having a separate method for each "type of cleanup being
> >>> performed", let the implementor handle it according to the data the
> >>> method receives.
> >>>
> >>> IMHO I think it's best to be completely agnostic over VM / LXC backups
> >>> (and their specific types) wherever possible and let the data describe
> >>> what's going on instead.
> >>>
> >>
> >> The point about extensibility is a good one. The API wouldn't need to
> >> change even if we implement new mechanisms. But thinking about it some
> >> more, is there anything really gained? Because we will not force plugins
> >> to implement the methods for new mechanisms of course, they can just
> >> continue supporting what they support. Each mechanism will have its own
> >> specific set of parameters, so throwing everything into a catch-all
> >> method and hash might make it too generic.
> > 
> > The main point is indeed extensibility, but it also makes maintaining
> > the API a bit easier. Should we (in the future) decide to add or remove
> > any parameters, we don't need to touch the signature - and in turn, we
> > don't need to tediously `grep` for every call site to ensure that
> > they're updated accordingly.
> > 
> > With hashes one could instead always just check if the required
> > arguments have been provided.
> > 
>
> I'm all for going with a similar API age + version mechanism like for
> the storage plugins, so removing parameters should not be done except
> for major releases and adding will be backwards-compatible.
>
> I don't quite get your point about not needing to update the call sites.
> If you change the structure of the passed-in hash you still need to do that.

Pardon me, I was a bit imprecise there. You're completely right that the
passed hash has to be changed as well, of course.

What I meant in particular was that because the signature (most likely)
won't change, we won't have to take special care to update the
individual arguments that are passed to a method when e.g. a parameter
is deprecated (or removed) or added.

For example, optional arguments are often just left out (because that's
possible in Perl), so if we introduced another parameter ...

* after the optional one, we'd have to pass something like `0` or
  `undef` for the optional one first before passing the new one:

    # before
    $plugin->foo($first, $second);

    # after
    $plugin->foo($first, $second, undef, $new); 

* before the optional one, we'd have to make sure the order of arguments
  is correct:

    # before
    $plugin->foo($first, $second, 1);

    # after
    $plugin->foo($first, $second, $new, 1);

If we were to use a hash representing the parameters instead, the cases
would look like this respectively:

    $plugin->foo({ first => $first, second => $second, new => $new });

    $plugin->foo({ first => $first, second => $second, optional => 1, new = $new });

More examples of that pattern would be `PVE::Tools::run_command` and
`PVE::RADOS::mon_command`.

These changes can (and IMO should) still be guarded by an API age +
version mechanism, it's just that the *surrounding maintenance work*
becomes easier IMO, even if the initial cost for us and for implementors
is a bit higher.

Of course, this is all a suggestion and I really don't want to seem like
I'm telling you what to do here! If you decide not to have a parameter
hash etc. then that's also completely fine by me, of course. :)

>
> I do see some benefit in not needing to add new methods for every
> mechanism, and there could also be a single backup_hook() method instead
> of a dedicated one for each phase while we're at it. But changes to the
> passed-in hashes will still fall under the same restrictions like
> changing a signature API-wise, so users will be informed via API age +
> version.

That's fair, yeah! I agree with you here as well - I think *not* having
an API age + version would be rather detrimental.

>
> >>
> >> Or think about the documentation for the single backup method: it would
> >> become super lengthy and describe all backup mechanisms, while a plugin
> >> most likely only cares about a single one and would have an easier time
> >> with a method that captures that mechanism's parameters explicitly.
> >> Won't the end result be making the implementors life slightly harder,
> >> because it first needs to extract the parameters for the specific mechanism?
> > 
> > Yes, I agree - this does create a bit of a double-edged sword -
> > implementors are responsible for handling the hash correctly; of course
> > they could lob it all into one generic method and call it a day, or they
> > could introduce a separate helper function for each `type`, for example.
> > 
>
> I would like to keep methods for containers and VMs separate in any
> case, because they require different things and I don't see any benefit
> in merging them. For containers, you backup the whole filesystem
> structure in one go, for VMs you get each disk separately. There are
> certain parameters that are better fixed, i.e. are passed for every
> mechanism, e.g. info about idmap for containers, drive ID for VMs. So
> having them in the signature rather then inside a hash is better and
> merging won't work if you have different fixed ones. But I'm fine with
> having a mechanism-agnostic signature, one for VMs and one for containers.

Okay, that also seems reasonable to me :)

>
> > The up- and downside of a generic method would be that it's up to the
> > implementor on how to deal with it.
> > 
> > At the same time, it would allow their plugin to handle different API
> > versions a bit easier as well, because the method signature wouldn't
> > change - only the data changes. If they wanted their plugin to support
> > multiple API versions all at once, they could certainly do it that way
> > and aren't restricted by a fixed set of parameters.
> > 
> > Now that I've given it some more thought, there are quite a bunch of ups
> > and downs, though I'm personally still in favour of the more generic
> > method, as it would reduce maintenance cost in the long run, IMO for
> > both us and implementors. The initial cost of adding the parameter
> > extraction / handling would be higher, I agree, but I feel like it's
> > more worth in the long run.
> > 
> > Also, IMO lengthy documentation is better than having a rigid API ;P
> > 
>
> I do prefer a rigid API. After all you don't want to make life for
> implementers hard by changing too much between versions. And it can
> still be a rigid API, even if there's a mechanism-agnostic signature,
> just keep the data backwards-compatible.

I also agree; I don't have anything to add, you hit the nail on the
head.

Also, thanks for going through this discussion here with me - even if
you choose to not incorporate my suggestions, I'm glad we have these
little exchanges, because they periodically update my perspective(s) on
Perl code as well :P