[pve-devel] [PATCH manager 1/3] Hold the guest migration lock when changing the replication config
Fabian Grünbichler
f.gruenbichler at proxmox.com
Mon Aug 3 09:49:41 CEST 2020
On July 30, 2020 1:29 pm, Fabian Ebner wrote:
> The guest migration lock is already held when running replications,
> but it also makes sense to hold it when updating the replication
> config itself. Otherwise, it can happen that the migration does
> not know the de-facto state of replication.
>
> For example:
> 1. migration starts
> 2. replication job is deleted
> 3. migration reads the replication config
> 4. migration runs the replication which causes the
> replicated disks to be removed, because the job
> is marked for removal
> 5. migration will continue without replication
>
> Note that the migration doesn't actually fail, but it's probably
> not the desired behavior either.
>
> Suggested-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
> ---
> PVE/API2/ReplicationConfig.pm | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/PVE/API2/ReplicationConfig.pm b/PVE/API2/ReplicationConfig.pm
> index 2b4ecd10..e5262068 100644
> --- a/PVE/API2/ReplicationConfig.pm
> +++ b/PVE/API2/ReplicationConfig.pm
> @@ -9,6 +9,7 @@ use PVE::JSONSchema qw(get_standard_option);
> use PVE::RPCEnvironment;
> use PVE::ReplicationConfig;
> use PVE::Cluster;
> +use PVE::GuestHelpers;
>
> use PVE::RESTHandler;
>
> @@ -144,7 +145,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
it might make sense to have a single wrapper for this, or add the guest
ID as parameter to ReplicationConfig::lock (to not miss it or get the
order wrong).
what about the calls to lock within ReplicationConfig? they are all
job/guest ID specific, and should also get this additional protection,
right?
from a quick glance, there seems to be only a single call to
ReplicationConfig::lock that spans more than one job (job_status in
ReplicationState), but that immediately iterates over jobs, so we could
either move the lock into the loop (expensive, since it involves a
cfs_lock), or split the cfs and flock just for this instance?
(side note, that code and possibly other stuff in ReplicationConfig is
buggy since it does not re-read the config after locking)
>
> return undef;
> }});
> @@ -167,6 +170,7 @@ __PACKAGE__->register_method ({
> my $id = extract_param($param, 'id');
> my $digest = extract_param($param, 'digest');
> my $delete = extract_param($param, 'delete');
> + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
>
> my $code = sub {
> my $cfg = PVE::ReplicationConfig->new();
> @@ -199,7 +203,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
> @@ -237,10 +243,12 @@ __PACKAGE__->register_method ({
>
> my $rpcenv = PVE::RPCEnvironment::get();
>
> + my $id = extract_param($param, 'id');
> + my ($guest_id) = PVE::ReplicationConfig::parse_replication_job_id($id);
> +
> my $code = sub {
> my $cfg = PVE::ReplicationConfig->new();
>
> - my $id = $param->{id};
> if ($param->{force}) {
> raise_param_exc({ 'keep' => "conflicts with parameter 'force'" }) if $param->{keep};
> delete $cfg->{ids}->{$id};
> @@ -262,7 +270,9 @@ __PACKAGE__->register_method ({
> $cfg->write();
> };
>
> - PVE::ReplicationConfig::lock($code);
> + PVE::GuestHelpers::guest_migration_lock($guest_id, 10, sub {
> + PVE::ReplicationConfig::lock($code);
> + });
>
> return undef;
> }});
> --
> 2.20.1
>
>
More information about the pve-devel
mailing list