[pmg-devel] [PATCH pmg-api 5/5] backup: pbs: prevent race in concurrent backups

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Feb 25 11:14:29 CET 2021


On 24.02.21 19:31, Stoiko Ivanov wrote:
> If two pbs backup-creation calls happen simultaenously, it is possible

s/simultaenously/simultaneously/

> that the first removes the backup dir before the other is done
> creating or sending it to the pbs remote.
> 
> non-PBS backups are not affected, since they create the files for
> tar in a tempdir (indexed by PID and current time).

seems like that has a proven track record and avoids issues this one has,
see below.

> 
> Noticed while having 2 schedules to different PBS instances with the
> same interval and w/o random delay.
> 
> Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
> ---
>  src/PMG/API2/PBS/Job.pm | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/PMG/API2/PBS/Job.pm b/src/PMG/API2/PBS/Job.pm
> index 279afbc..e5dcb9c 100644
> --- a/src/PMG/API2/PBS/Job.pm
> +++ b/src/PMG/API2/PBS/Job.pm
> @@ -303,13 +303,14 @@ __PACKAGE__->register_method ({
>  
>  	my $pbs = PVE::PBSClient->new($remote_config, $remote, $conf->{secret_dir});
>  	my $backup_dir = "/var/lib/pmg/backup/current";
> +	my $lockfile = "/var/lock/pmg-pbs-backup.lck";
>  
>  	my $worker = sub {
>  	    my $upid = shift;
>  
>  	    my $log = "starting update of current backup state\n";
>  
> -	    eval {
> +	    my $create_backup = sub {
>  		-d $backup_dir || mkdir $backup_dir;
>  		PMG::Backup::pmg_backup($backup_dir, $param->{statistic});
>  
> @@ -317,6 +318,10 @@ __PACKAGE__->register_method ({
>  
>  		rmtree $backup_dir;
>  	    };
> +
> +	    eval {
> +		PVE::Tools::lock_file($lockfile, undef, $create_backup);

lock_file times out in 10s, as we have multiple people running into a 20s timeout
in PBS I guess this does not solves the problem at all, as the backup coming
second to the lock acquire can still always fail if backup always needs more than
10s (maybe unlikely in your fast local setup, not so unlikely if PBS is external
both are slow and/or high loaded).

Instead of bumping that timeout to dice-roll-times-100 I'd rather use different
target backups as mentioned yesterday in our lighthearted off-list lunch talk
about this.

Between same-backup job locking could be an idea, but not to sure how many people
plan to have jobs requiring minutes and setting up schedules minutely.

That could be something one could warn about in the backup task log at the end
if wanted, (there we now the duration and could check time between next run)

> +	    };
>  	    my $err = $@;
>  	    $log .= $err if $err;
>  
> 





More information about the pmg-devel mailing list