[pve-devel] [PATCH 1/3] add live storage migration with vm migration
Alexandre DERUMIER
aderumier at odiso.com
Wed Oct 12 14:50:33 CEST 2016
> + #if storage migration is already done, but vm migration crash, we need to move the vm config
>>Here the VM is not always crashed, just the migration failed or?
>>
>>If yes, would it not be better to let the VM continue to run where it
>>was (if it can even run here) and free the migrated volume on the target
>>storage?
>>Stopping the VM here seems not obvious.
The problem is that, if storage migration has been done and vm migration crash,
the vm is still running, but with nbd attached (and target vm still running).
(and new write are already done on target storage).
That's why I force stop of source vm and migrate config file on target.
----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 12 Octobre 2016 09:44:25
Objet: Re: [pve-devel] [PATCH 1/3] add live storage migration with vm migration
comments inline
On 10/11/2016 04:45 PM, Alexandre Derumier wrote:
> This allow to migrate a local storage (only 1 for now) to a remote node storage.
>
> When the target node start, a new volume is created and exposed through qemu embedded nbd server.
>
> qemu drive-mirror is done on source vm with nbd server as target.
>
> when drive-mirror is done, the source vm is running the disk though nbd.
>
> Then we live migration the vm to destination node.
>
> Signed-off-by: Alexandre Derumier <aderumier at odiso.com>
> ---
> PVE/API2/Qemu.pm | 7 +++++
> PVE/QemuMigrate.pm | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 93 insertions(+), 5 deletions(-)
>
> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
> index 21fbebb..acb1412 100644
> --- a/PVE/API2/Qemu.pm
> +++ b/PVE/API2/Qemu.pm
> @@ -2648,6 +2648,10 @@ __PACKAGE__->register_method({
> description => "Allow to migrate VMs which use local devices. Only root may use this option.",
> optional => 1,
> },
> + targetstorage => get_standard_option('pve-storage-id', {
> + description => "Target storage.",
> + optional => 1,
> + }),
> },
> },
> returns => {
> @@ -2674,6 +2678,9 @@ __PACKAGE__->register_method({
>
> my $vmid = extract_param($param, 'vmid');
>
> + raise_param_exc({ targetstorage => "Live Storage migration can only be done online" })
> + if !$param->{online} && $param->{targetstorage};
> +
> raise_param_exc({ force => "Only root may use this option." })
> if $param->{force} && $authuser ne 'root at pam';
>
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index 22a49ef..6e90296 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -170,9 +170,11 @@ sub prepare {
> $self->{forcemachine} = PVE::QemuServer::qemu_machine_pxe($vmid, $conf);
>
> }
> -
style nitpick: if you do a v2 let this line stay
> if (my $loc_res = PVE::QemuServer::check_local_resources($conf, 1)) {
> - if ($self->{running} || !$self->{opts}->{force}) {
> + if ($self->{running} && $self->{opts}->{targetstorage}){
> + $self->log('info', "migrating VM with online storage migration");
> + }
> + elsif ($self->{running} || !$self->{opts}->{force} ) {
This here is wrong. The method "check_local_resources" checks only for
usb/hostpci/serial/parallel devices, not for local storages.
So you shouldn't need this code part at all?
> die "can't migrate VM which uses local devices\n";
> } else {
> $self->log('info', "migrating VM which uses local devices");
> @@ -182,12 +184,16 @@ sub prepare {
> my $vollist = PVE::QemuServer::get_vm_volumes($conf);
>
> my $need_activate = [];
> + my $unsharedcount = 0;
> foreach my $volid (@$vollist) {
> my ($sid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
>
> # check if storage is available on both nodes
> my $scfg = PVE::Storage::storage_check_node($self->{storecfg}, $sid);
> - PVE::Storage::storage_check_node($self->{storecfg}, $sid, $self->{node});
> + my $targetsid = $sid;
> + $targetsid = $self->{opts}->{targetstorage} if $self->{opts}->{targetstorage};
> +
we often use
my $targetsid = $self->{opts}->{targetstorage} || $sid;
but just nitpicking here :)
> + PVE::Storage::storage_check_node($self->{storecfg}, $targetsid, $self->{node});
>
> if ($scfg->{shared}) {
> # PVE::Storage::activate_storage checks this for non-shared storages
> @@ -197,9 +203,12 @@ sub prepare {
> } else {
> # only activate if not shared
> push @$need_activate, $volid;
> + $unsharedcount++;
> }
> }
>
> + die "online storage migration don't support more than 1 local disk currently" if $unsharedcount > 1;
> +
> # activate volumes
> PVE::Storage::activate_volumes($self->{storecfg}, $need_activate);
>
> @@ -407,7 +416,7 @@ sub phase1 {
> $conf->{lock} = 'migrate';
> PVE::QemuConfig->write_config($vmid, $conf);
>
> - sync_disks($self, $vmid);
> + sync_disks($self, $vmid) if !$self->{opts}->{targetstorage};
>
> };
>
> @@ -452,7 +461,7 @@ sub phase2 {
> $spice_ticket = $res->{ticket};
> }
>
> - push @$cmd , 'qm', 'start', $vmid, '--skiplock', '--migratedfrom', $nodename;
> + push @$cmd , 'qm', 'start', $vmid, '--skiplock', '--migratedfrom', $nodename, '--targetstorage', $self->{opts}->{targetstorage};
>
> # we use TCP only for unsecure migrations as TCP ssh forward tunnels often
> # did appeared to late (they are hard, if not impossible, to check for)
> @@ -472,6 +481,7 @@ sub phase2 {
> }
>
> my $spice_port;
> + my $nbd_uri;
>
> # Note: We try to keep $spice_ticket secret (do not pass via command line parameter)
> # instead we pipe it through STDIN
> @@ -496,6 +506,13 @@ sub phase2 {
> elsif ($line =~ m/^spice listens on port (\d+)$/) {
> $spice_port = int($1);
> }
> + elsif ($line =~ m/^storage migration listens on nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+):exportname=(\S+) volume:(\S+)$/) {
we have a regex for valid IP v4/v6 addresses, maybe use that here
> + $nbd_uri = "nbd:$1:$2:exportname=$3";
> + $self->{target_volid} = $4;
> + $self->{target_drive} = $3;
> + $self->{target_drive} =~ s/drive-//g;
> +
> + }
> }, errfunc => sub {
> my $line = shift;
> $self->log('info', $line);
> @@ -540,6 +557,18 @@ sub phase2 {
> }
>
> my $start = time();
> +
> + if ($self->{opts}->{targetstorage}) {
> + $self->log('info', "starting storage migration on $nbd_uri");
> + $self->{storage_migration} = 'running';
> + PVE::QemuServer::qemu_drive_mirror($vmid, $self->{target_drive}, $nbd_uri, $vmid);
> + #update config
> + $self->{storage_migration} = 'finish';
> + my $drive = PVE::QemuServer::parse_drive($self->{target_drive}, $self->{target_volid});
> + $conf->{$self->{target_drive}} = PVE::QemuServer::print_drive($vmid, $drive);
> + PVE::QemuConfig->write_config($vmid, $conf);
> + }
> +
> $self->log('info', "starting online/live migration on $ruri");
> $self->{livemigration} = 1;
>
> @@ -748,6 +777,48 @@ sub phase2_cleanup {
> $self->{errors} = 1;
> }
>
> + if($self->{storage_migration} && $self->{storage_migration} eq 'running' && $self->{target_volid}) {
> +
> + my $drive = PVE::QemuServer::parse_drive($self->{target_drive}, $self->{target_volid});
> + my ($storeid, $volname) = PVE::Storage::parse_volume_id($drive->{file});
> +
> + my $cmd = [@{$self->{rem_ssh}}, 'pvesm', 'free', "$storeid:$volname"];
> +
> + eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
> + if (my $err = $@) {
> + $self->log('err', $err);
> + $self->{errors} = 1;
> + }
> + }
> +
> + #if storage migration is already done, but vm migration crash, we need to move the vm config
Here the VM is not always crashed, just the migration failed or?
If yes, would it not be better to let the VM continue to run where it
was (if it can even run here) and free the migrated volume on the target
storage?
Stopping the VM here seems not obvious.
> + if($self->{storage_migration} && $self->{storage_migration} eq 'finish' && $self->{target_volid}) {
> +
> + # stop local VM
> + eval { PVE::QemuServer::vm_stop($self->{storecfg}, $vmid, 1, 1); };
> + if (my $err = $@) {
> + $self->log('err', "stopping vm failed - $err");
> + $self->{errors} = 1;
> + }
> +
> + # always deactivate volumes - avoid lvm LVs to be active on several nodes
> + eval {
> + my $vollist = PVE::QemuServer::get_vm_volumes($conf);
> + PVE::Storage::deactivate_volumes($self->{storecfg}, $vollist);
> + };
> + if (my $err = $@) {
> + $self->log('err', $err);
> + $self->{errors} = 1;
> + }
> +
> + # move config to remote node
> + my $conffile = PVE::QemuConfig->config_file($vmid);
> + my $newconffile = PVE::QemuConfig->config_file($vmid, $self->{node});
> +
> + die "Failed to move config to node '$self->{node}' - rename failed: $!\n"
> + if !rename($conffile, $newconffile);
> + }
> +
> if ($self->{tunnel}) {
> eval { finish_tunnel($self, $self->{tunnel}); };
> if (my $err = $@) {
> @@ -755,6 +826,9 @@ sub phase2_cleanup {
> $self->{errors} = 1;
> }
> }
> +
> +
> +
> }
>
> sub phase3 {
> @@ -834,6 +908,13 @@ sub phase3_cleanup {
> $self->{errors} = 1;
> }
>
> + #improve me
> + if($self->{storage_migration}) {
> + #delete source volid ?
> +
> + #stop nbd server to remote vm
> + }
> +
> # clear migrate lock
> my $cmd = [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ];
> $self->cmd_logerr($cmd, errmsg => "failed to clear migrate lock");
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list