[pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config

Fabian Grünbichler f.gruenbichler at proxmox.com
Mon Aug 3 09:49:41 CEST 2020


On July 30, 2020 1:29 pm, Fabian Ebner wrote:
> The guest migration lock is already held when running replications,
> but it also makes sense to hold it when updating the replication
> config itself. Otherwise, it can happen that the migration does
> not know the de-facto state of replication.
> 
> For example:
> 1. migration starts
> 2. replication job is deleted
> 3. migration reads the replication config
> 4. migration runs the replication which causes the
>    replicated disks to be removed, because the job
>    is marked for removal
> 5. migration will continue without replication
> 
> Note that the migration doesn't actually fail, but it's probably
> not the desired behavior either.
> 
> Suggested-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
> ---
>  PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm
> index 2b4ecd10..e5262068 100644
> --- a/PVE/API2/ReplicationConfig.pm
> +++ b/PVE/API2/ReplicationConfig.pm
> @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option);
>  use PVE::RPCEnvironment;
>  use PVE::ReplicationConfig;
>  use PVE::Cluster;
> +use PVE::GuestHelpers;
>  
>  use PVE::RESTHandler;
>  
> @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({
>  	    $cfg->write();
>  	};
>  
> -	PVE::ReplicationConfig::lock($code);
> +	PVE::GuestHelpers::guest_migration_lock($guest, 10, sub {
> +	    PVE::ReplicationConfig::lock($code);
> +	});

it might make sense to have a single wrapper for this, or add the guest 
ID as parameter to ReplicationConfig::lock (to not miss it or get the 
order wrong).

what about the calls to lock within ReplicationConfig? they are all 
job/guest ID specific, and should also get this additional protection, 
right?

from a quick glance, there seems to be only a single call to 
ReplicationConfig::lock that spans more than one job (job_status in 
ReplicationState), but that immediately iterates over jobs, so we could 
either move the lock into the loop (expensive, since it involves a 
cfs_lock), or split the cfs and flock just for this instance?

(side note, that code and possibly other stuff in ReplicationConfig is 
buggy since it does not re-read the config after locking)

>  
>  	return undef;
>      }});
> @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({
>  	my $id = extract_param($param, 'id');
>  	my $digest = extract_param($param, 'digest');
>  	my $delete = extract_param($param, 'delete');
> +	my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
>  
>  	my $code = sub {
>  	    my $cfg = PVE::ReplicationConfig->new();
> @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({
>  	    $cfg->write();
>  	};
>  
> -	PVE::ReplicationConfig::lock($code);
> +	PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> +	    PVE::ReplicationConfig::lock($code);
> +	});
>  
>  	return undef;
>      }});
> @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({
>  
>  	my $rpcenv = PVE::RPCEnvironment::get();
>  
> +	my $id = extract_param($param, 'id');
> +	my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
> +
>  	my $code = sub {
>  	    my $cfg = PVE::ReplicationConfig->new();
>  
> -	    my $id = $param->{id};
>  	    if ($param->{force}) {
>  		raise_param_exc({ 'keep' => "conflicts with parameter 'force'" }) if $param->{keep};
>  		delete $cfg->{ids}->{$id};
> @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({
>  	    $cfg->write();
>  	};
>  
> -	PVE::ReplicationConfig::lock($code);
> +	PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> +	    PVE::ReplicationConfig::lock($code);
> +	});
>  
>  	return undef;
>      }});
> -- 
> 2.20.1
> 
> 




More information about the pve-devel mailing list