[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
Alexandre DERUMIER
aderumier at odiso.com
Fri Jul 28 16:06:41 CEST 2017
>>without pvesr:
>>[1501248903.534722] 64 bytes from 10.3.95.200: icmp_seq=275 ttl=64 time=0.512 ms
>>[1501248906.444269] 64 bytes from 10.3.95.200: icmp_seq=302 ttl=64 time=0.499 ms
>>27 ping loss : 2,7s
>>2017-07-28 15:42:42,998 -- VM status: paused (inmigrate)
>>2017-07-28 15:42:43,041 -- VM status: paused
>>....
>>2017-07-28 15:42:46,013 -- VM status: running
same test again,without pvesr, but replacing remote "qm resume" , with ssh + simple socat
echo -e '{ "execute": "qmp_capabilities" } \n {"execute":"human-monitor-command","arguments":{"command-line":"cont"}}' | socat - UNIX-CONNECT:/var/run/qemu-server/244.qmp
qemu : downtime result: 55 ms
vm-status
----------
2017-07-28 16:00:10,280 -- VM status: paused (inmigrate)
2017-07-28 16:00:10,305 -- VM status: paused
....
2017-07-28 16:00:10,540 -- VM status: running
around 130ms (so overhead in phase3_cleanup)
ping 0.1
---------
[1501250471.649694] 64 bytes from 10.3.95.200: icmp_seq=82 ttl=64 time=2.35 ms
[1501250472.072153] 64 bytes from 10.3.95.200: icmp_seq=86 ttl=64 time=0.414 ms
around 4 packets loss, so 400ms (but it should be arp)
----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 15:53:39
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
>>in the 0.25-0.3 range here for all our commands (just for forking and
>>printing the usage)..
here a profiling of pvesr, on a old xeon 5110 1,6ghz.
http://odisoweb1.odiso.net/nytprof/
(note that I'm around 0,3s on recent xeon e5 3,1ghz vs 1,2s )
>>well, you say that calling "qm" takes about a second on your system, and
>>we need to call "qm resume" over SSH for the VM to be continued. so how
>>can that happen in <100 ms?
mmm, indeed....
ping -D -i 0.1 10.3.95.200
---------------------------
with pvesr:
[1501248711.774209] 64 bytes from 10.3.95.200: icmp_seq=79 ttl=64 time=0.669 ms
[1501248716.088902] 64 bytes from 10.3.95.200: icmp_seq=119 ttl=64 time=1.17 ms
40 ping loss 4s
without pvesr:
[1501248903.534722] 64 bytes from 10.3.95.200: icmp_seq=275 ttl=64 time=0.512 ms
[1501248906.444269] 64 bytes from 10.3.95.200: icmp_seq=302 ttl=64 time=0.499 ms
27 ping loss : 2,7s
2017-07-28 15:42:42,998 -- VM status: paused (inmigrate)
2017-07-28 15:42:43,041 -- VM status: paused
....
2017-07-28 15:42:46,013 -- VM status: running
I think I have done something wrong with my previous test.
Theses result seem to be expected, if qm resume is also slow on my old box.
----- Mail original -----
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 14:55:01
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
On Fri, Jul 28, 2017 at 01:22:31PM +0200, Alexandre DERUMIER wrote:
> pvesr through ssh
> -----------------
> root at kvmtest1 ~ # time /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=kvmtest2' root at 10.3.94.47 pvesr set-state 244 \''{}'\'
>
> real 0m1.399s
I just realized SSH was probably slowed down in my test case by other
external factors, so heres your command repeated on a test cluster:
real 0m0.407s
user 0m0.004s
sys 0m0.000s
>
>
> locally
> --------
> root at kvmtest2:~# time pvesr set-state 244 {}
> real 0m1.137s
>
real 0m0.268s
user 0m0.240s
sys 0m0.024s
>
> so 40ms for ssh, and 1,137s for pvesr itself.
see above - wonder where the difference comes from? is it possible that
you have a big state file from testing? lots of guest configs?
>
> (I think we could simply skip call if state if empty, but reusing ssh could help too a little bit)
>
>
> also , a simple
>
> #time pvesr
> real 0m1.098s
>
> (same for qm or other command)
in the 0.25-0.3 range here for all our commands (just for forking and
printing the usage)..
> >>that does not make sense - are you sure you haven't removed anything
> >>else? qemu does not know or care about pvesr, so why should it resume
> >>automatically?
>
> no it's not resume automatically. This is the log of an external script, calling qmp status in loop
> to see how much time it's really paused.
> removing pvesr in phase3, reduce the pause time (between the end of phase2 and qm resume).
>
well, you say that calling "qm" takes about a second on your system, and
we need to call "qm resume" over SSH for the VM to be continued. so how
can that happen in <100 ms?
I cannot reproduce your results at all. the only way I can achieve
downtime in the two digits ms range is by reverting commit
b37ecfe6ae7f7b557db7712ee6988cb0397306e9
observed from outside using ping -D -i 0.1 :
stock PVE 5:
[1501245897.627138] 64 bytes from 10.0.0.213: icmp_seq=70 ttl=64 time=0.273 ms
[1501245897.731102] 64 bytes from 10.0.0.213: icmp_seq=71 ttl=64 time=0.255 ms
[1501245897.835237] 64 bytes from 10.0.0.213: icmp_seq=72 ttl=64 time=0.352 ms
[1501245900.955324] 64 bytes from 10.0.0.213: icmp_seq=102 ttl=64 time=0.419 ms
[1501245901.059196] 64 bytes from 10.0.0.213: icmp_seq=103 ttl=64 time=0.298 ms
[1501245901.163360] 64 bytes from 10.0.0.213: icmp_seq=104 ttl=64 time=0.440 ms
no call to pvesr set-state over SSH:
[1501245952.119454] 64 bytes from 10.0.0.213: icmp_seq=63 ttl=64 time=0.586 ms
[1501245952.226278] 64 bytes from 10.0.0.213: icmp_seq=64 ttl=64 time=3.41 ms
[1501245955.027289] 64 bytes from 10.0.0.213: icmp_seq=91 ttl=64 time=0.414 ms
[1501245955.131317] 64 bytes from 10.0.0.213: icmp_seq=92 ttl=64 time=0.447 ms
stock PVE 5 with b37ecfe6ae7f7b557db7712ee6988cb0397306e9 reverted:
no downtime visible via ping, it's too small.
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list