[pve-devel] [PATCH storage] fix #4289: pbs: wait for backup verification to finish before updating volume attribute

Fiona Ebner f.ebner at proxmox.com
Wed Jan 4 11:50:38 CET 2023


Am 02.01.23 um 13:36 schrieb Christoph Heiss:
> diff --git a/PVE/Storage/PBSPlugin.pm b/PVE/Storage/PBSPlugin.pm
> index 4320974..1cdbc11 100644
> --- a/PVE/Storage/PBSPlugin.pm
> +++ b/PVE/Storage/PBSPlugin.pm
> @@ -906,8 +906,30 @@ sub get_volume_attribute {
>      return;
>  }
> 
> +sub wait_for_verify_finish {
> +    my ($conn, $node, $datastore, $attrs) = @_;
> +
> +    my $param = {
> +	running => 'true',
> +	since => $attrs->{'backup-time'},
> +	store => $datastore,
> +	typefilter => 'verify',
> +    };
> +
> +    my $taskname = sprintf('%s:%s/%s/%X',
> +	$datastore,
> +        @{$attrs}{qw(backup-type backup-id backup-time)},
> +    );

I don't think it's likely that the task name format here will change
often, but as you already mentioned in the cover letter, it's not ideal
to have it hard-coded here.

> +
> +    while (1) {
> +	my $res = eval { $conn->get("/api2/json/nodes/$node/tasks", $param); };
> +	last if !grep { $_->{worker_id} eq $taskname } @$res;
> +	sleep(1);
> +    }
> +}
> +
> @@ -921,6 +943,9 @@ sub update_volume_attribute {
>  	my $conn = pbs_api_connect($scfg, $password);
>  	my $datastore = $scfg->{datastore};
> 
> +	$logfunc->('info', 'waiting for server to finish backup verification...') if $logfunc;

Should only be printed if there is actually a verification we need to
wait for.

> +	wait_for_verify_finish($conn, $scfg->{server}, $datastore, $param);

To me, it feels out of place to be concerned with waiting on
verification in (the rather low-level) update_volume_attribute(), which
is a rather specific thing to do. I'd say it's fine to fail there when
the snapshot is locked by verification or some other operation.

Waiting for verification also can increase the backup duration/time
holding the vzdump lock on the PVE side quite a bit. It might not seem
that big of a deal, because usually only manual backups use 'protected'.
But by doing it in update_volume_attribute(), you also do it for
'notes', where it's not needed and which is relevant to backup jobs
where the increased wait might be very noticeable. So at least, it
should only be done for 'protected' if doing it in
update_volume_attribute().

It would be better if the protected flag could be specified upon
creation already. Would also fix the following race I guess:
1. backup finishes
2. prune running on PBS
3. protected status set from PVE

If going for the waiting approach after all, I think it should rather be
done in vzdump, before calling update_volume_attribute(). And the helper
to wait on verification should likely be part of PBSClient.pm (would
need to teach it to use an API connection first).





More information about the pve-devel mailing list