[pve-devel] [PATCH pve-zsync] Allow detecting a syncing instance of a job

Fabian Grünbichler f.gruenbichler at proxmox.com
Tue Oct 1 10:29:00 CEST 2019


On September 30, 2019 12:55 pm, Fabian Ebner wrote:
> Before, the check whether a syncing instance of the same job is already present
> was inside the locked section. This caused cron to continuously spawn new
> instances of pve-zsync on syncs (or rather groups of syncs) taking longer
> than 15 minutes, see [0] in the forum. This patch introduces a new locked
> section for checking the current status and a new 'waiting' status.
> The 'waiting' status is needed to mark jobs which are currently waiting
> for the lock for syncing. So if job A is syncing and job B is waiting for
> the lock then all new instances of job B will see that one instance is
> already scheduled to sync.
> 
> [0]: https://forum.proxmox.com/threads/pve-zsync-bug-spawns-endless-cron-processes.58087/
> 
> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
> ---
>  pve-zsync | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/pve-zsync b/pve-zsync
> index 425ffa2..90c1bb3 100755
> --- a/pve-zsync
> +++ b/pve-zsync
> @@ -19,6 +19,7 @@ my $PVE_DIR = "/etc/pve/local";
>  my $QEMU_CONF = "${PVE_DIR}/qemu-server";
>  my $LXC_CONF = "${PVE_DIR}/lxc";
>  my $LOCKFILE = "$CONFIG_PATH/${PROGNAME}.lock";
> +my $LOCKFILE_STATUS_CHECK = "$CONFIG_PATH/${PROGNAME}_status_check.lock";
>  my $PROG_PATH = "$PATH/${PROGNAME}";
>  my $INTERVAL = 15;
>  my $DEBUG;
> @@ -578,20 +579,34 @@ sub destroy_job {
>  sub sync {
>      my ($param) = @_;
>  
> +    my $lock_status_check_fh = IO::File->new("> $LOCKFILE_STATUS_CHECK");
> +    die "Can't open Lock File: $LOCKFILE_STATUS_CHECK $!\n" if !$lock_status_check_fh;
> +    lock($lock_status_check_fh);
> +
> +    my $job;
> +    eval {
> +	$job = get_job($param);
> +    };
> +
> +    if ($job && defined($job->{state}) && ($job->{state} eq "syncing" || $job->{state} eq "waiting")) {
> +	unlock($lock_status_check_fh);
> +	die "Job --source $param->{source} --name $param->{name} is already scheduled to sync\n";
> +    }
> +
> +    $job->{state} = "waiting";
> +    update_state($job);
> +    unlock($lock_status_check_fh);

I don't think we need a (new) lock here - it would only protect against 
other processes entering sync(), but they could at worst do the same 
change (again)? update_state() already has locking to prevent races 
around the modification itself, and this new lock does not protect 
against other read-modify-write cycles (like destroy_job, enable_job, 
disable_job, but also init which just does write ;)).

if we introduce a new lock, it should be to guard all state r-m-w cycles 
  against races with eachother (e.g., by implementing Thomas locked() 
  suggestion, and pulling the lock from update_state there and updating 
  all call sites). the other locks (sync, update_cron) can also re-use 
  this locked mechanism, by passing in some lock identifier/filename 
  (you can also have wrappers around locked, e.g. lock_state, lock_cron 
  and lock_sync).

> +
>      my $lock_fh = IO::File->new("> $LOCKFILE");
>      die "Can't open Lock File: $LOCKFILE $!\n" if !$lock_fh;
>      lock($lock_fh);
>  
> +    #job might've changed while we waited for the lock, but we can be sure it's not syncing
>      my $date = get_date();
> -    my $job;
>      eval {
>  	$job = get_job($param);
>      };
>  
> -    if ($job && defined($job->{state}) && $job->{state} eq "syncing") {
> -	die "Job --source $param->{source} --name $param->{name} is syncing at the moment";
> -    }
> -
>      my $dest = parse_target($param->{dest});
>      my $source = parse_target($param->{source});
>  
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 




More information about the pve-devel mailing list