[pve-devel] applied: [v2 qemu-server 00/10] improve live-migration downtime
Wolfgang Bumiller
w.bumiller at proxmox.com
Mon Aug 7 09:47:05 CEST 2017
Applied whole series.
On Fri, Aug 04, 2017 at 02:53:57PM +0200, Fabian Grünbichler wrote:
> this patch series attempts to reduce the downtime occuring during
> live-migration of VMs to sane levels by
> - conditionalizing potentially unneeded SSH connections
> - replacing commands over SSH with new 'qm mtunnel' commands
> - reducing the polling interval to notice a completed migration faster
>
> attempts to monitor down time via ping produced rather unreliable results,
> probably cause of ARP? but old to old is reliable slowest there too..
>
> following are durations in 'paused' state, between 'paused inmigrate' and
> 'running', measured with qmp status with 0.1 sleep inbetween, tests repeated 5
> times each on a network-rate-limited virtual cluster.
>
> with old polling, 2G RAM (actual RAM transfer in <2s, so no auto-reduction of
> polling interval happens):
>
> old code: average 3.2s
> new to old: average 1.6s (skips pvesr set-state)
> new to new: average 1.2s
>
> with old polling, 8G RAM (auto-reduction of polling interval kicks in, slightly better results):
>
> old code: average 2.7s
> new to old: 1s
> new to new: 0.7s
>
> with reduced polling interval (last patch applied), 2G and 8G RAM:
> new to old: 0.4s
> new to new: one single instance of logged paused state over 5 migrations!
>
> with reduced polling interval, 8G RAM, old code but with last patch applied:
> 2s
>
> so it seems like this is the right combination of changes to get downtime back
> to acceptable levels without sacrificing consistency.
>
> commands which might be integrated into mtunnel as well in the future:
> -pvesr set-state
> -qm nbdstop
> -qm unlock
>
> changes from v1, based on Thomas' feedback:
>
> ------8<------8<------8<------8<------8<------8<------
>
> diff --git a/PVE/CLI/qm.pm b/PVE/CLI/qm.pm
> index 1792cb0..5dce10f 100755
> --- a/PVE/CLI/qm.pm
> +++ b/PVE/CLI/qm.pm
> @@ -273,7 +273,7 @@ __PACKAGE__->register_method ({
> };
>
> $tunnel_write->("tunnel online");
> - $tunnel_write->("ver 1.0");
> + $tunnel_write->("ver 1");
>
> while (my $line = <>) {
> chomp $line;
> diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
> index ac9ac22..fc847cc 100644
> --- a/PVE/QemuMigrate.pm
> +++ b/PVE/QemuMigrate.pm
> @@ -124,7 +124,7 @@ sub write_tunnel {
> };
> die "writing to tunnel failed: $@\n" if $@;
>
> - if ($tunnel->{version} && $tunnel->{version} >= 1.0) {
> + if ($tunnel->{version} && $tunnel->{version} >= 1) {
> my $res = eval { $self->read_tunnel($tunnel, 10); };
> die "no reply to command '$command': $@\n" if $@;
>
> @@ -156,9 +156,12 @@ sub fork_tunnel {
>
> eval {
> my $ver = $self->read_tunnel($tunnel, 10);
> - $ver =~ /^ver (\d+\.\d+)$/;
> - $tunnel->{version} = $1 if $1;
> - $self->log('info', "ssh tunnel version: $tunnel->{version}\n");
> + if ($ver =~ /^ver (\d+)$/) {
> + $tunnel->{version} = $1;
> + $self->log('info', "ssh tunnel $ver\n");
> + } else {
> + $err = "received invalid tunnel version string '$ver'\n" if !$err;
> + }
> };
>
> if ($err) {
> @@ -923,7 +926,7 @@ sub phase3_cleanup {
> die "Failed to move config to node '$self->{node}' - rename failed: $!\n"
> if !rename($conffile, $newconffile);
>
> - $self->switch_replication_job_target() if $self->{replicated_volumes};;
> + $self->switch_replication_job_target() if $self->{replicated_volumes};
>
> if ($self->{livemigration}) {
> if ($self->{storage_migration}) {
> @@ -943,7 +946,7 @@ sub phase3_cleanup {
> }
>
> # config moved and nbd server stopped - now we can resume vm on target
> - if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1.0) {
> + if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1) {
> eval {
> $self->write_tunnel($tunnel, 30, "resume $vmid");
> };
> @@ -953,13 +956,11 @@ sub phase3_cleanup {
> }
> } else {
> my $cmd = [@{$self->{rem_ssh}}, 'qm', 'resume', $vmid, '--skiplock', '--nocheck'];
> - eval {
> - my $logf = sub {
> - my $line = shift;
> - $self->log('err', $line);
> - };
> - PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => $logf);
> + my $logf = sub {
> + my $line = shift;
> + $self->log('err', $line);
> };
> + eval { PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => $logf); };
> if (my $err = $@) {
> $self->log('err', $err);
> $self->{errors} = 1;
>
> ------>8------>8------>8------>8------>8------>8------
>
> Fabian Grünbichler (10):
> migrate: switch back to qm mtunnel
> migrate: refactor mtunnel read/write
> qm mtunnel: add tunnel version
> migrate: read mtunnel version
> qm mtunnel: add write helper
> mtunnel: add and handle OK/ERR replies
> qm mtunnel/migrate: add resume VMID command
> migrate: finish tunnel in phase 3
> migrate: keep track of replication
> migrate: reduce polling intervals
>
> PVE/CLI/qm.pm | 28 ++++++++++--
> PVE/QemuMigrate.pm | 127 ++++++++++++++++++++++++++++++++++++++---------------
> 2 files changed, 117 insertions(+), 38 deletions(-)
>
> --
> 2.11.0
More information about the pve-devel
mailing list