[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Alexandre DERUMIER aderumier at odiso.com
Fri Jul 28 16:06:41 CEST 2017


>>without pvesr: 
>>[1501248903.534722] 64 bytes from 10.3.95.200: icmp_seq=275 ttl=64 time=0.512 ms 
>>[1501248906.444269] 64 bytes from 10.3.95.200: icmp_seq=302 ttl=64 time=0.499 ms 

>>27 ping loss : 2,7s 


>>2017-07-28 15:42:42,998 -- VM status: paused (inmigrate) 
>>2017-07-28 15:42:43,041 -- VM status: paused 
>>.... 
>>2017-07-28 15:42:46,013 -- VM status: running 



same test again,without pvesr, but replacing remote "qm resume" , with ssh + simple socat 
echo -e  '{ "execute": "qmp_capabilities" } \n {"execute":"human-monitor-command","arguments":{"command-line":"cont"}}' | socat - UNIX-CONNECT:/var/run/qemu-server/244.qmp



qemu : downtime result: 55 ms

vm-status
----------
2017-07-28 16:00:10,280 -- VM status: paused (inmigrate)
2017-07-28 16:00:10,305 -- VM status: paused
....
2017-07-28 16:00:10,540 -- VM status: running

around 130ms  (so overhead in phase3_cleanup)


ping 0.1
---------
[1501250471.649694] 64 bytes from 10.3.95.200: icmp_seq=82 ttl=64 time=2.35 ms
[1501250472.072153] 64 bytes from 10.3.95.200: icmp_seq=86 ttl=64 time=0.414 ms

around 4 packets loss, so 400ms (but it should be arp)




----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 15:53:39
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

>>in the 0.25-0.3 range here for all our commands (just for forking and 
>>printing the usage).. 

here a profiling of pvesr, on a old xeon 5110 1,6ghz. 
http://odisoweb1.odiso.net/nytprof/ 


(note that I'm around 0,3s on recent xeon e5 3,1ghz vs 1,2s ) 


>>well, you say that calling "qm" takes about a second on your system, and 
>>we need to call "qm resume" over SSH for the VM to be continued. so how 
>>can that happen in <100 ms? 

mmm, indeed.... 






ping -D -i 0.1 10.3.95.200 
--------------------------- 
with pvesr: 

[1501248711.774209] 64 bytes from 10.3.95.200: icmp_seq=79 ttl=64 time=0.669 ms 
[1501248716.088902] 64 bytes from 10.3.95.200: icmp_seq=119 ttl=64 time=1.17 ms 
40 ping loss 4s 

without pvesr: 
[1501248903.534722] 64 bytes from 10.3.95.200: icmp_seq=275 ttl=64 time=0.512 ms 
[1501248906.444269] 64 bytes from 10.3.95.200: icmp_seq=302 ttl=64 time=0.499 ms 

27 ping loss : 2,7s 


2017-07-28 15:42:42,998 -- VM status: paused (inmigrate) 
2017-07-28 15:42:43,041 -- VM status: paused 
.... 
2017-07-28 15:42:46,013 -- VM status: running 


I think I have done something wrong with my previous test. 
Theses result seem to be expected, if qm resume is also slow on my old box. 




----- Mail original ----- 
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Vendredi 28 Juillet 2017 14:55:01 
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4) 

On Fri, Jul 28, 2017 at 01:22:31PM +0200, Alexandre DERUMIER wrote: 
> pvesr through ssh 
> ----------------- 
> root at kvmtest1 ~ # time /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=kvmtest2' root at 10.3.94.47 pvesr set-state 244 \''{}'\' 
> 
> real 0m1.399s 

I just realized SSH was probably slowed down in my test case by other 
external factors, so heres your command repeated on a test cluster: 

real 0m0.407s 
user 0m0.004s 
sys 0m0.000s 

> 
> 
> locally 
> -------- 
> root at kvmtest2:~# time pvesr set-state 244 {} 
> real 0m1.137s 
> 

real 0m0.268s 
user 0m0.240s 
sys 0m0.024s 


> 
> so 40ms for ssh, and 1,137s for pvesr itself. 

see above - wonder where the difference comes from? is it possible that 
you have a big state file from testing? lots of guest configs? 

> 
> (I think we could simply skip call if state if empty, but reusing ssh could help too a little bit) 
> 
> 
> also , a simple 
> 
> #time pvesr 
> real 0m1.098s 
> 
> (same for qm or other command) 

in the 0.25-0.3 range here for all our commands (just for forking and 
printing the usage).. 

> >>that does not make sense - are you sure you haven't removed anything 
> >>else? qemu does not know or care about pvesr, so why should it resume 
> >>automatically? 
> 
> no it's not resume automatically. This is the log of an external script, calling qmp status in loop 
> to see how much time it's really paused. 
> removing pvesr in phase3, reduce the pause time (between the end of phase2 and qm resume). 
> 

well, you say that calling "qm" takes about a second on your system, and 
we need to call "qm resume" over SSH for the VM to be continued. so how 
can that happen in <100 ms? 

I cannot reproduce your results at all. the only way I can achieve 
downtime in the two digits ms range is by reverting commit 
b37ecfe6ae7f7b557db7712ee6988cb0397306e9 

observed from outside using ping -D -i 0.1 : 

stock PVE 5: 
[1501245897.627138] 64 bytes from 10.0.0.213: icmp_seq=70 ttl=64 time=0.273 ms 
[1501245897.731102] 64 bytes from 10.0.0.213: icmp_seq=71 ttl=64 time=0.255 ms 
[1501245897.835237] 64 bytes from 10.0.0.213: icmp_seq=72 ttl=64 time=0.352 ms 
[1501245900.955324] 64 bytes from 10.0.0.213: icmp_seq=102 ttl=64 time=0.419 ms 
[1501245901.059196] 64 bytes from 10.0.0.213: icmp_seq=103 ttl=64 time=0.298 ms 
[1501245901.163360] 64 bytes from 10.0.0.213: icmp_seq=104 ttl=64 time=0.440 ms 

no call to pvesr set-state over SSH: 
[1501245952.119454] 64 bytes from 10.0.0.213: icmp_seq=63 ttl=64 time=0.586 ms 
[1501245952.226278] 64 bytes from 10.0.0.213: icmp_seq=64 ttl=64 time=3.41 ms 
[1501245955.027289] 64 bytes from 10.0.0.213: icmp_seq=91 ttl=64 time=0.414 ms 
[1501245955.131317] 64 bytes from 10.0.0.213: icmp_seq=92 ttl=64 time=0.447 ms 

stock PVE 5 with b37ecfe6ae7f7b557db7712ee6988cb0397306e9 reverted: 
no downtime visible via ping, it's too small. 



_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list