[pve-devel] [PATCH v6 qemu-server 2/5] disk reassign: add API endpoint

Mon Apr 19 11:25:20 CEST 2021

On 4/18/21 5:24 PM, Thomas Lamprecht wrote:
> On 02.04.21 12:19, Aaron Lauterer wrote:
>> The goal of this new API endpoint is to provide an easy way to move a
>> disk between VMs as this was only possible with manual intervention
>> until now. Either by renaming the VM disk or by manually adding the
>> disks volid to the config of the other VM.
>>
>> The latter can easily cause unexpected behavior such as disks attached
>> to VM B would be deleted if it used to be a disk of VM A. This happens
>> because PVE assumes that the VMID in the volname always matches the VM
>> the disk is attached to and thus, would remove any disk with VMID A
>> when VM A was deleted.
>>
>> The term `reassign` was chosen as it is not yet used
>> for VM disks.
>>
>> Signed-off-by: Aaron Lauterer <a.lauterer at proxmox.com>
>> ---
>> v5 -> v6:
>> * guard Replication snapshot cleanup
>>      additionally to the eval, that code is now only run if the volume is
>>      on a storage with the 'replicate' feature
>> * add permission check for target vmid
>> * changed regex to match unused keys better
>>
>> thx @Fabian for these suggestions/catching problems
>>
>> v4 -> v5:
>> * implemented suggestions from Fabian [1]
>>      * logging before action
>>      * improving description
>>      * improving error messages
>>      * using Replication::prepare to remove replication snapshots
>>      * check if disk is physical disk using /dev/...
>>
>> v3 -> v4: nothing
>>
>> v2 -> v3:
>> * reordered the locking as discussed with fabian [0] to
>> run checks
>>      fork worker
>> 	lock source config
>> 	    lock target config
>> 		run checks
>> 		...
>>
>> * added more checks
>>      * will not reassign to or from templates
>>      * will not reassign if VM has snapshots present
>> * cleanup if disk used to be replicated
>> * made task log slightly more verbose
>> * integrated general recommendations regarding code
>> * renamed `disk` to `drive_key`
>> * prepended some vars with `source_` for easier distinction
>>
>> v1 -> v2: print config key and volid info at the end of the job so it
>> shows up on the CLI and task log
>>
>> rfc -> v1:
>> * add support to reassign unused disks
>> * add support to provide a config digest for the target vm
>> * add additional check if disk key is present in config
>> * reorder checks a bit
>>
>> In order to support unused disk I had to extend
>> PVE::QemuServer::Drive::valid_drive_names for the API parameter
>> validation.
>>
>> Checks are ordered so that cheap tests are run at the first chance to
>> fail early.
>>
>> The check if both VMs are present on the node is a bit redundant because
>> locking the config files will fail if the VM is not present. But with
>> the additional check we can provide a useful error message to the user
>> instead of a "Configuration file xyz does not exist" error.
>>
>> [0] https://lists.proxmox.com/pipermail/pve-devel/2020-September/044930.html
>> [1] https://lists.proxmox.com/pipermail/pve-devel/2020-November/046030.html
>>
>>   PVE/API2/Qemu.pm        | 172 ++++++++++++++++++++++++++++++++++++++++
>>   PVE/QemuServer/Drive.pm |   4 +
>>   2 files changed, 176 insertions(+)
>>
>> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> index e95ab13..9642b9b 100644
>> --- a/PVE/API2/Qemu.pm
>> +++ b/PVE/API2/Qemu.pm
>> @@ -35,6 +35,7 @@ use PVE::API2::Qemu::Agent;
>>   use PVE::VZDump::Plugin;
>>   use PVE::DataCenterConfig;
>>   use PVE::SSHInfo;
>> +use PVE::Replication;
>>   
>>   BEGIN {
>>       if (!$ENV{PVE_GENERATING_DOCS}) {
>> @@ -4377,4 +4378,175 @@ __PACKAGE__->register_method({
>>   	return PVE::QemuServer::Cloudinit::dump_cloudinit_config($conf, $param->{vmid}, $param->{type});
>>       }});
>>   
>> +__PACKAGE__->register_method({
>> +    name => 'reassign_vm_disk',
>> +    path => '{vmid}/reassign_disk',
>> +    method => 'POST',
>> +    protected => 1,
>> +    proxyto => 'node',
>> +    description => "Reassign a disk to another VM",
>> +    permissions => {
>> +	description => "You need 'VM.Config.Disk' permissions on /vms/{vmid} and /vms/{target vmid}, and 'Datastore.Allocate' permissions on the storage.",
>> +	check => [ 'and',
>> +		   ['perm', '/vms/{vmid}', [ 'VM.Config.Disk' ]],
>> +		   ['perm', '/storage/{storage}', [ 'Datastore.Allocate' ]],
>> +	    ],
>> +    },
>> +    parameters => {
>> +        additionalProperties => 0,
>> +	properties => {
>> +	    node => get_standard_option('pve-node'),
>> +	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
>> +	    target_vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
> 
> s/target_vmid/'target-vmid'/
> 
>> +	    drive_name => {
> 
> A bit late in the review cycle, but why not both source and target drive parameters?
> IMO it's not a good idea to just find a free disk-name at the target, we control digest
> already, so we can move this to the user to choose. This allows them to actually know
> at which drive the new disks is assigned too, else they cannot really know.

Sounds like a good idea and is something that will help further below, reducing the chances that the disk will be in limbo.

> 
> For consistency and easier grasping of the API the following parameters could then be
> used:
> 
> source-vmid
> target-vmid
> source-drive
> target-drive
> source-digest
> target-digest
> 
> 
>> +	        type => 'string',
>> +		description => "The config key of the disk to reassign (for example, ide0 or scsi1).",
>> +		enum => [PVE::QemuServer::Drive::valid_drive_names_with_unused()],
>> +	    },
>> +	    digest => {
>> +		type => 'string',
>> +		description => 'Prevent changes if current the configuration file of the source VM has a different SHA1 digest. This can be used to prevent concurrent modifications.',
>> +		maxLength => 40,
>> +		optional => 1,
>> +	    },
>> +	    target_digest => {
>> +		type => 'string',
>> +		description => 'Prevent changes if current the configuration file of the target VM has a different SHA1 digest. This can be used to prevent concurrent modifications.',
>> +		maxLength => 40,
>> +		optional => 1,
>> +	    },
>> +	},
>> +    },
>> +    returns => {
>> +	type => 'string',
>> +	description => "the task ID.",
>> +    },
>> +    code => sub {
>> +	my ($param) = @_;
>> +
>> +	my $rpcenv = PVE::RPCEnvironment::get();
>> +	my $authuser = $rpcenv->get_user();
>> +
>> +	my $node = extract_param($param, 'node');
>> +	my $source_vmid = extract_param($param, 'vmid');
>> +	my $target_vmid = extract_param($param, 'target_vmid');
>> +	my $source_digest = extract_param($param, 'digest');
>> +	my $target_digest = extract_param($param, 'target_digest');
>> +	my $drive_name = extract_param($param, 'drive_name');
>> +
>> +	my $storecfg = PVE::Storage::config();
>> +	my $vmlist;
>> +	my $drive;
>> +	my $source_volid;
>> +
>> +	$rpcenv->check_vm_perm($authuser, $target_vmid, undef, ['VM.Config.Disk'])
>> +	    if $authuser ne 'root at pam';
>> +
>> +	die "Reassigning a disk to the same VM is not possible. Did you mean to move the disk?\n"
>> +	    if $source_vmid eq $target_vmid;
>> +
>> +	my $load_and_check_configs = sub {
>> +	    $vmlist = PVE::Cluster::get_vmlist()->{ids};
>> +	    die "Both VMs need to be on the same node ($vmlist->{$source_vmid}->{node}) but target VM is on $vmlist->{$target_vmid}->{node}.\n"
>> +		if $vmlist->{$source_vmid}->{node} ne $vmlist->{$target_vmid}->{node};
>> +
>> +	    my $source_conf = PVE::QemuConfig->load_config($source_vmid);
>> +	    PVE::QemuConfig->check_lock($source_conf);
>> +	    my $target_conf = PVE::QemuConfig->load_config($target_vmid);
>> +	    PVE::QemuConfig->check_lock($target_conf);
>> +
>> +	    die "Can't reassign disks from or to templates\n"
>> +		if ($source_conf->{template} || $target_conf->{template});
>> +
>> +	    if ($source_digest) {
>> +		eval { PVE::Tools::assert_if_modified($source_digest, $source_conf->{digest}) };
>> +		if (my $err = $@) {
>> +		    die "VM ${source_vmid}: ${err}";
>> +		}
>> +	    }
>> +
>> +	    if ($target_digest) {
>> +		eval { PVE::Tools::assert_if_modified($target_digest, $target_conf->{digest}) };
>> +		if (my $err = $@) {
>> +		    die "VM ${target_vmid}: ${err}";
>> +		}
>> +	    }
>> +
>> +	    die "Disk '${drive_name}' does not exist\n"
>> +		if !defined($source_conf->{$drive_name});
>> +
>> +	    $drive = PVE::QemuServer::parse_drive($drive_name, $source_conf->{$drive_name});
>> +	    $source_volid = $drive->{file};
>> +	    die "disk '${drive_name}' has no associated volume\n" if !$source_volid;
>> +	    die "CD drive contents can't be reassigned\n" if PVE::QemuServer::drive_is_cdrom($drive, 1);
>> +	    die "Can't reassign physical disk\n" if $drive->{file} =~ m|^/dev/|;
>> +	    die "Can't reassign disk used by a snapshot\n"
>> +		if PVE::QemuServer::Drive::is_volume_in_use($storecfg, $source_conf, $drive_name, $source_volid);
>> +
>> +	    die "Storage does not support the reassignment of this disk\n"
>> +		if !PVE::Storage::volume_has_feature($storecfg, 'reassign', $source_volid);
>> +
>> +	    die "Cannot reassign disk while the source VM is running\n"
>> +		if PVE::QemuServer::check_running($source_vmid) && $drive_name !~ m/^unused\d+$/;
>> +
>> +	    return ($source_conf, $target_conf);
>> +	};
>> +
>> +	my $logfunc = sub {
>> +	    my ($msg) = @_;
>> +	    print STDERR "$msg\n";
>> +	};
>> +
>> +	my $reassign_func = sub {
>> +	    return PVE::QemuConfig->lock_config($source_vmid, sub {
>> +		return PVE::QemuConfig->lock_config($target_vmid, sub {
>> +		    my ($source_conf, $target_conf) = &$load_and_check_configs();
>> +
>> +		    PVE::Cluster::log_msg('info', $authuser, "reassign disk VM $source_vmid: reassign --disk ${drive_name} --target_vmid $target_vmid");
> 
> why a cluster log? this is no cluster action and already runs in a task log?
> 
>> +
>> +		    my $new_volid = PVE::Storage::reassign_volume(
>> +			$storecfg,
>> +			$source_volid,
>> +			$target_vmid,
>> +		    );
>> +
>> +		    delete $source_conf->{$drive_name};
>> +		    print "removing disk '${drive_name}' from VM '${source_vmid}'\n";
>> +		    PVE::QemuConfig->write_config($source_vmid, $source_conf);
>> +
> 
> if the task/node/... dies here the disk is lost in limbo?

hmm, I guess if we also make a drive key a mandatory parameter, we could immediately write it to the new VMs config and do the snapshot cleanup afterwards. The possible error, a few lines further down, if there is no unused slot available would also be avoided. If the snapshot cleanup fails, it needs to be handled manually, but the disk is assigned to the right VM.

> 
>> +		    # remove possible replication snapshots
>> +		    if (PVE::Storage::volume_has_feature($storecfg, 'replicate', $source_volid)) {
>> +			eval {
>> +			    PVE::Replication::prepare(
>> +				$storecfg,
>> +				[$new_volid],
>> +				undef,
>> +				undef,
>> +				undef,
>> +				$logfunc,
>> +			    )
>> +			};
>> +			if (my $err = $@) {
>> +			    print "Failed to remove replication snapshots on reassigned disk. Manual cleanup could be necessary.\n";
>> +			}
>> +		    }
>> +
>> +		    my $key;
>> +		    eval { $key = PVE::QemuConfig->add_unused_volume($target_conf, $new_volid) };
>> +		    if (my $err = $@) {
>> +			print "failed to add reassigned disk '${new_volid}' to VM '${target_vmid}'. Try to free an 'unused' disk slot and run 'qm rescan ${target_vmid}'.\n";
>> +			return 0;
>> +		    }
>> +
>> +		    print "adding disk to VM '${target_vmid}' as '${key}: ${new_volid}'\n";
>> +		    PVE::QemuConfig->write_config($target_vmid, $target_conf);
> 
> This can violate the underlying assumption that a VMID is owned by it's node, and
> thus only that node will modify its configuration...
> 
> We could restrict this to the case where both VMs(hmm, where's the CT implementation?)

CTs is something that can be done in the future as well ;).

> are on the same node, which would be the safe and easy way for now. For cross-node we
> would probably need to make it two-phase and proxy the add over.
> 
> But after re-checking above's load_config($target_vmid) usage this whole thing can
> currently only work if both VMIDs are on the same node - I do not see that limitation
> described anywhere, would be good to have prominently in the API description and also
> improve UX by checking explicitly if the $target_vmid is located on this node.
> 
> Or were you not aware of that?

The check if both VMs are located on the same node is one of the first checks at the beginning of the API code. Mentioning this in the description is something I forgot. I'll rephrase it a bit to mention that fact.

> 
> 
>> +		});
>> +	    });
>> +	};
>> +
>> +	&$load_and_check_configs();
>> +
>> +	return $rpcenv->fork_worker('qmreassign', $source_vmid, $authuser, $reassign_func);
> 
> the worker ID could be something like: "$source_vmid:$source_drive -> $target_vmid:$target_drive".
> 
>> +    }});
>> +
>>   1;
>> diff --git a/PVE/QemuServer/Drive.pm b/PVE/QemuServer/Drive.pm
>> index 01ea8d7..e938b9b 100644
>> --- a/PVE/QemuServer/Drive.pm
>> +++ b/PVE/QemuServer/Drive.pm
>> @@ -392,6 +392,10 @@ sub valid_drive_names {
>>               'efidisk0');
>>   }
>>   
>> +sub valid_drive_names_with_unused {
>> +    return (valid_drive_names(), map {"unused$_"} (0 .. ($MAX_UNUSED_DISKS -1)));
>> +}
>> +
>>   sub is_valid_drivename {
>>       my $dev = shift;
>>   
>>
>