[pve-devel] qemu live migration: bigger downtime recently
aderumier at odiso.com
aderumier at odiso.com
Fri Jan 22 19:55:19 CET 2021
after some debug, it seem that it's hanging on
$stat = mon_cmd($vmid, "query-migrate");
result of info migrate after the end of a migration:
# info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 9671 ms
downtime: 9595 ms
setup: 74 ms
transferred ram: 10445790 kbytes
throughput: 8916.93 mbps
remaining ram: 0 kbytes
total ram: 12600392 kbytes
duplicate: 544936 pages
skipped: 0 pages
normal: 2605162 pages
normal bytes: 10420648 kbytes
dirty sync count: 2
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 296540
cache size: 2147483648 bytes
xbzrle transferred: 0 kbytes
xbzrle pages: 0 pages
xbzrle cache miss: 0 pages
xbzrle cache miss rate: 0.00
xbzrle encoding rate: 0.00
xbzrle overflow: 0
Le vendredi 22 janvier 2021 à 16:06 +0100, aderumier at odiso.com a
écrit :
> I have tried to add a log to display the current status state of the
> migration,
> and It don't catch any "active" state, but "completed" directly.
>
> Here another sample with a bigger downtime of 14s (real downtime, I
> have checked with a ping to be sure)
>
>
>
> 2021-01-22 16:02:53 starting migration of VM 391 to node 'kvm13'
> (10.3.94.70)
> 2021-01-22 16:02:53 starting VM 391 on remote node 'kvm13'
> 2021-01-22 16:02:55 start remote tunnel
> 2021-01-22 16:02:56 ssh tunnel ver 1
> 2021-01-22 16:02:56 starting online/live migration on
> tcp:10.3.94.70:60000
> 2021-01-22 16:02:56 set migration_caps
> 2021-01-22 16:02:56 migration speed limit: 8589934592 B/s
> 2021-01-22 16:02:56 migration downtime limit: 100 ms
> 2021-01-22 16:02:56 migration cachesize: 2147483648 B
> 2021-01-22 16:02:56 set migration parameters
> 2021-01-22 16:02:56 start migrate command to tcp:10.3.94.70:60000
>
>
>
> 2021-01-22 16:03:11 status: completed ---> added log
> 2021-01-22 16:03:11 migration speed: 1092.27 MB/s - downtime 14424 ms
> 2021-01-22 16:03:11 migration status: completed
> 2021-01-22 16:03:14 migration finished successfully (duration
> 00:00:21)
> TASK OK
>
>
>
> my $merr = $@;
> $self->log('info', "migrate uri => $ruri failed: $merr") if
> $merr;
>
> my $lstat = 0;
> my $usleep = 1000000;
> my $i = 0;
> my $err_count = 0;
> my $lastrem = undef;
> my $downtimecounter = 0;
> while (1) {
> $i++;
> my $avglstat = $lstat ? $lstat / $i : 0;
>
> usleep($usleep);
> my $stat;
> eval {
> $stat = mon_cmd($vmid, "query-migrate");
> };
> if (my $err = $@) {
> $err_count++;
> warn "query migrate failed: $err\n";
> $self->log('info', "query migrate failed: $err");
> if ($err_count <= 5) {
> usleep(1000000);
> next;
> }
> die "too many query migrate failures - aborting\n";
> }
>
> $self->log('info', "status: $stat->{status}"); ---> added
> log
>
>
> Le vendredi 22 janvier 2021 à 15:34 +0100, aderumier at odiso.com a
> écrit :
> > Hi,
> >
> > I have notice recently bigger downtime on qemu live migration.
> > (I'm not sure if it's after qemu update or qemu-server update)
> >
> > migration: type=insecure
> >
> > qemu-server 6.3-2
> > pve-qemu-kvm 5.1.0-7
> >
> > (I'm not sure about the machine running qemu version)
> >
> >
> >
> > Here a sample:
> >
> >
> >
> > 2021-01-22 15:28:38 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:28:42 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:28:44 start remote tunnel
> > 2021-01-22 15:28:45 ssh tunnel ver 1
> > 2021-01-22 15:28:45 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:28:45 set migration_caps
> > 2021-01-22 15:28:45 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:28:45 migration downtime limit: 100 ms
> > 2021-01-22 15:28:45 migration cachesize: 268435456 B
> > 2021-01-22 15:28:45 set migration parameters
> > 2021-01-22 15:28:45 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:28:47 migration speed: 1024.00 MB/s - downtime 2117
> > ms
> > 2021-01-22 15:28:47 migration status: completed
> > 2021-01-22 15:28:51 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> >
> > That's strange because I don't see the memory transfert loop logs
> >
> >
> >
> > Migrate back to original host is working
> >
> > 2021-01-22 15:29:34 starting migration of VM 226 to node 'kvm2'
> > (::ffff:10.3.94.50)
> > 2021-01-22 15:29:36 starting VM 226 on remote node 'kvm2'
> > 2021-01-22 15:29:39 start remote tunnel
> > 2021-01-22 15:29:40 ssh tunnel ver 1
> > 2021-01-22 15:29:40 starting online/live migration on
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:40 set migration_caps
> > 2021-01-22 15:29:40 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:29:40 migration downtime limit: 100 ms
> > 2021-01-22 15:29:40 migration cachesize: 268435456 B
> > 2021-01-22 15:29:40 set migration parameters
> > 2021-01-22 15:29:40 start migrate command to
> > tcp:[::ffff:10.3.94.50]:60000
> > 2021-01-22 15:29:41 migration status: active (transferred
> > 396107554,
> > remaining 1732018176), total 2165383168)
> > 2021-01-22 15:29:41 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:42 migration status: active (transferred
> > 973010921,
> > remaining 1089216512), total 2165383168)
> > 2021-01-22 15:29:42 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:43 migration status: active (transferred
> > 1511925476,
> > remaining 483463168), total 2165383168)
> > 2021-01-22 15:29:43 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:29:44 migration speed: 512.00 MB/s - downtime 148 ms
> > 2021-01-22 15:29:44 migration status: completed
> > 2021-01-22 15:29:47 migration finished successfully (duration
> > 00:00:13)
> > TASK OK
> >
> >
> > Then migrate it again like the first migration is working too
> >
> >
> > 2021-01-22 15:31:07 starting migration of VM 226 to node 'kvm13'
> > (10.3.94.70)
> > 2021-01-22 15:31:10 starting VM 226 on remote node 'kvm13'
> > 2021-01-22 15:31:12 start remote tunnel
> > 2021-01-22 15:31:13 ssh tunnel ver 1
> > 2021-01-22 15:31:13 starting online/live migration on
> > tcp:10.3.94.70:60000
> > 2021-01-22 15:31:13 set migration_caps
> > 2021-01-22 15:31:13 migration speed limit: 8589934592 B/s
> > 2021-01-22 15:31:13 migration downtime limit: 100 ms
> > 2021-01-22 15:31:13 migration cachesize: 268435456 B
> > 2021-01-22 15:31:13 set migration parameters
> > 2021-01-22 15:31:13 start migrate command to tcp:10.3.94.70:60000
> > 2021-01-22 15:31:14 migration status: active (transferred
> > 1092088188,
> > remaining 944365568), total 2165383168)
> > 2021-01-22 15:31:14 migration xbzrle cachesize: 268435456
> > transferred
> > 0
> > pages 0 cachemiss 0 overflow 0
> > 2021-01-22 15:31:15 migration speed: 1024.00 MB/s - downtime 55 ms
> > 2021-01-22 15:31:15 migration status: completed
> > 2021-01-22 15:31:19 migration finished successfully (duration
> > 00:00:12)
> > TASK OK
> >
> >
> > Any idea ? Maybe a specific qemu version bug ?
> >
> >
> >
> >
>
>
More information about the pve-devel
mailing list