[pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute
Christoph Heiss
c.heiss at proxmox.com
Tue Jan 10 12:11:41 CET 2023
Thanks for the review!
On Wed, Jan 04, 2023 at 11:50:38AM +0100, Fiona Ebner wrote:
> Am 02.01.23 um 13:36 schrieb Christoph Heiss:
> > diff --git a/PVE/Storage/PBSPlugin.pm b/PVE/Storage/PBSPlugin.pm
> > index 4320974..1cdbc11 100644
> > --- a/PVE/Storage/PBSPlugin.pm
> > +++ b/PVE/Storage/PBSPlugin.pm
> > @@ -906,8 +906,30 @@ sub get_volume_attribute {
> > return;
> > }
> >
> > +sub wait_for_verify_finish {
> > + my ($conn, $node, $datastore, $attrs) = @_;
> > +
> > + my $param = {
> > + running => 'true',
> > + since => $attrs->{'backup-time'},
> > + store => $datastore,
> > + typefilter => 'verify',
> > + };
> > +
> > + my $taskname = sprintf('%s:%s/%s/%X',
> > + $datastore,
> > + @{$attrs}{qw(backup-type backup-id backup-time)},
> > + );
>
> I don't think it's likely that the task name format here will change
> often, but as you already mentioned in the cover letter, it's not ideal
> to have it hard-coded here.
>
> > +
> > + while (1) {
> > + my $res = eval { $conn->get("/api2/json/nodes/$node/tasks", $param); };
> > + last if !grep { $_->{worker_id} eq $taskname } @$res;
> > + sleep(1);
> > + }
> > +}
> > +
> > @@ -921,6 +943,9 @@ sub update_volume_attribute {
> > my $conn = pbs_api_connect($scfg, $password);
> > my $datastore = $scfg->{datastore};
> >
> > + $logfunc->('info', 'waiting for server to finish backup verification...') if $logfunc;
>
> Should only be printed if there is actually a verification we need to
> wait for.
Makes sense.
>
> > + wait_for_verify_finish($conn, $scfg->{server}, $datastore, $param);
>
> To me, it feels out of place to be concerned with waiting on
> verification in (the rather low-level) update_volume_attribute(), which
> is a rather specific thing to do. I'd say it's fine to fail there when
> the snapshot is locked by verification or some other operation.
>
> Waiting for verification also can increase the backup duration/time
> holding the vzdump lock on the PVE side quite a bit.
That was one of my concerns too. Especially for very big VMs this can
probably delay the task quite a bit.
> It might not seem that big of a deal, because usually only manual
> backups use 'protected'. But by doing it in
> update_volume_attribute(), you also do it for 'notes', where it's not
> needed and which is relevant to backup jobs where the increased wait
> might be very noticeable. So at least, it should only be done for
> 'protected' if doing it in update_volume_attribute().
That is actually the case now - updating notes takes a different path
through update_volume_notes().
>
> It would be better if the protected flag could be specified upon
> creation already. Would also fix the following race I guess:
It definitely would be a lot cleaner. I'll see what I can do and rework
the whole series.
Probably involves adding a new parameter to the `proxmox-backup-client
backup` command and API(?) AFAICS. But this would not be all that bad
of a feature for the backup client in general, I think.
And I guess I need to figure out a way how to detect whether the new
parameter is supported or not?
In case this it not supported, just keeping the current behavior (i.e.
best-effort via the API and maybe failing) is probably the sensible way.
> 1. backup finishes
> 2. prune running on PBS
> 3. protected status set from PVE
>
> If going for the waiting approach after all, I think it should rather be
> done in vzdump, before calling update_volume_attribute(). And the helper
> to wait on verification should likely be part of PBSClient.pm (would
> need to teach it to use an API connection first).
More information about the pve-devel
mailing list