[pve-devel] applied: [v2 qemu-server 00/10] improve live-migration downtime

Wolfgang Bumiller w.bumiller at proxmox.com
Mon Aug 7 09:47:05 CEST 2017


Applied whole series.

On Fri, Aug 04, 2017 at 02:53:57PM +0200, Fabian Grünbichler wrote:
> this patch series attempts to reduce the downtime occuring during
> live-migration of VMs to sane levels by
> - conditionalizing potentially unneeded SSH connections
> - replacing commands over SSH with new 'qm mtunnel' commands
> - reducing the polling interval to notice a completed migration faster
> 
> attempts to monitor down time via ping produced rather unreliable results,
> probably cause of ARP? but old to old is reliable slowest there too..
> 
> following are durations in 'paused' state, between 'paused inmigrate' and
> 'running', measured with qmp status with 0.1 sleep inbetween, tests repeated 5
> times each on a network-rate-limited virtual cluster.
> 
> with old polling, 2G RAM (actual RAM transfer in <2s, so no auto-reduction of
> polling interval happens):
> 
> old code: average 3.2s
> new to old: average 1.6s (skips pvesr set-state)
> new to new: average 1.2s
> 
> with old polling, 8G RAM (auto-reduction of polling interval kicks in, slightly better results):
> 
> old code: average 2.7s
> new to old: 1s
> new to new: 0.7s
> 
> with reduced polling interval (last patch applied), 2G and 8G RAM:
> new to old: 0.4s
> new to new: one single instance of logged paused state over 5 migrations!
> 
> with reduced polling interval, 8G RAM, old code but with last patch applied:
> 2s
> 
> so it seems like this is the right combination of changes to get downtime back
> to acceptable levels without sacrificing consistency.
> 
> commands which might be integrated into mtunnel as well in the future:
> -pvesr set-state
> -qm nbdstop
> -qm unlock
> 
> changes from v1, based on Thomas' feedback:
> 
> ------8<------8<------8<------8<------8<------8<------
> 
> diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
> index 1792cb0..5dce10f 100755
> --- a/PVE/CLI/qm.pm
> +++ b/PVE/CLI/qm.pm
> @@ -273,7 +273,7 @@ __PACKAGE__->register_method ({
>  	};
>  
>  	$tunnel_write->("tunnel online");
> -	$tunnel_write->("ver 1.0");
> +	$tunnel_write->("ver 1");
>  
>  	while (my $line = <>) {
>  	    chomp $line;
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index ac9ac22..fc847cc 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -124,7 +124,7 @@ sub write_tunnel {
>      };
>      die "writing to tunnel failed: $@\n" if $@;
>  
> -    if ($tunnel->{version} && $tunnel->{version} >= 1.0) {
> +    if ($tunnel->{version} && $tunnel->{version} >= 1) {
>  	my $res = eval { $self->read_tunnel($tunnel, 10); };
>  	die "no reply to command '$command': $@\n" if $@;
>  
> @@ -156,9 +156,12 @@ sub fork_tunnel {
>  
>      eval {
>  	my $ver = $self->read_tunnel($tunnel, 10);
> -	$ver =~ /^ver (\d+\.\d+)$/;
> -	$tunnel->{version} = $1 if $1;
> -	$self->log('info', "ssh tunnel version: $tunnel->{version}\n");
> +	if ($ver =~ /^ver (\d+)$/) {
> +	    $tunnel->{version} = $1;
> +	    $self->log('info', "ssh tunnel $ver\n");
> +	} else {
> +	    $err = "received invalid tunnel version string '$ver'\n" if !$err;
> +	}
>      };
>  
>      if ($err) {
> @@ -923,7 +926,7 @@ sub phase3_cleanup {
>      die "Failed to move config to node '$self->{node}' - rename failed: $!\n"
>          if !rename($conffile, $newconffile);
>  
> -    $self->switch_replication_job_target() if $self->{replicated_volumes};;
> +    $self->switch_replication_job_target() if $self->{replicated_volumes};
>  
>      if ($self->{livemigration}) {
>  	if ($self->{storage_migration}) {
> @@ -943,7 +946,7 @@ sub phase3_cleanup {
>  	}
>  
>  	# config moved and nbd server stopped - now we can resume vm on target
> -	if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1.0) {
> +	if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1) {
>  	    eval {
>  		$self->write_tunnel($tunnel, 30, "resume $vmid");
>  	    };
> @@ -953,13 +956,11 @@ sub phase3_cleanup {
>  	    }
>  	} else {
>  	    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'resume', $vmid, '--skiplock', '--nocheck'];
> -	    eval {
> -		my $logf = sub {
> -			my $line = shift;
> -			$self->log('err', $line);
> -		};
> -		PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => $logf);
> +	    my $logf = sub {
> +		my $line = shift;
> +		$self->log('err', $line);
>  	    };
> +	    eval { PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => $logf); };
>  	    if (my $err = $@) {
>  		$self->log('err', $err);
>  		$self->{errors} = 1;
> 
> ------>8------>8------>8------>8------>8------>8------
> 
> Fabian Grünbichler (10):
>   migrate: switch back to qm mtunnel
>   migrate: refactor mtunnel read/write
>   qm mtunnel: add tunnel version
>   migrate: read mtunnel version
>   qm mtunnel: add write helper
>   mtunnel: add and handle OK/ERR replies
>   qm mtunnel/migrate: add resume VMID command
>   migrate: finish tunnel in phase 3
>   migrate: keep track of replication
>   migrate: reduce polling intervals
> 
>  PVE/CLI/qm.pm      |  28 ++++++++++--
>  PVE/QemuMigrate.pm | 127 ++++++++++++++++++++++++++++++++++++++---------------
>  2 files changed, 117 insertions(+), 38 deletions(-)
> 
> -- 
> 2.11.0




More information about the pve-devel mailing list