[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
Fabian Grünbichler
f.gruenbichler at proxmox.com
Fri Jul 28 14:55:01 CEST 2017
On Fri, Jul 28, 2017 at 01:22:31PM +0200, Alexandre DERUMIER wrote:
> pvesr through ssh
> -----------------
> root at kvmtest1 ~ # time /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=kvmtest2' root at 10.3.94.47 pvesr set-state 244 \''{}'\'
>
> real 0m1.399s
I just realized SSH was probably slowed down in my test case by other
external factors, so heres your command repeated on a test cluster:
real 0m0.407s
user 0m0.004s
sys 0m0.000s
>
>
> locally
> --------
> root at kvmtest2:~# time pvesr set-state 244 {}
> real 0m1.137s
>
real 0m0.268s
user 0m0.240s
sys 0m0.024s
>
> so 40ms for ssh, and 1,137s for pvesr itself.
see above - wonder where the difference comes from? is it possible that
you have a big state file from testing? lots of guest configs?
>
> (I think we could simply skip call if state if empty, but reusing ssh could help too a little bit)
>
>
> also , a simple
>
> #time pvesr
> real 0m1.098s
>
> (same for qm or other command)
in the 0.25-0.3 range here for all our commands (just for forking and
printing the usage)..
> >>that does not make sense - are you sure you haven't removed anything
> >>else? qemu does not know or care about pvesr, so why should it resume
> >>automatically?
>
> no it's not resume automatically. This is the log of an external script, calling qmp status in loop
> to see how much time it's really paused.
> removing pvesr in phase3, reduce the pause time (between the end of phase2 and qm resume).
>
well, you say that calling "qm" takes about a second on your system, and
we need to call "qm resume" over SSH for the VM to be continued. so how
can that happen in <100 ms?
I cannot reproduce your results at all. the only way I can achieve
downtime in the two digits ms range is by reverting commit
b37ecfe6ae7f7b557db7712ee6988cb0397306e9
observed from outside using ping -D -i 0.1 :
stock PVE 5:
[1501245897.627138] 64 bytes from 10.0.0.213: icmp_seq=70 ttl=64 time=0.273 ms
[1501245897.731102] 64 bytes from 10.0.0.213: icmp_seq=71 ttl=64 time=0.255 ms
[1501245897.835237] 64 bytes from 10.0.0.213: icmp_seq=72 ttl=64 time=0.352 ms
[1501245900.955324] 64 bytes from 10.0.0.213: icmp_seq=102 ttl=64 time=0.419 ms
[1501245901.059196] 64 bytes from 10.0.0.213: icmp_seq=103 ttl=64 time=0.298 ms
[1501245901.163360] 64 bytes from 10.0.0.213: icmp_seq=104 ttl=64 time=0.440 ms
no call to pvesr set-state over SSH:
[1501245952.119454] 64 bytes from 10.0.0.213: icmp_seq=63 ttl=64 time=0.586 ms
[1501245952.226278] 64 bytes from 10.0.0.213: icmp_seq=64 ttl=64 time=3.41 ms
[1501245955.027289] 64 bytes from 10.0.0.213: icmp_seq=91 ttl=64 time=0.414 ms
[1501245955.131317] 64 bytes from 10.0.0.213: icmp_seq=92 ttl=64 time=0.447 ms
stock PVE 5 with b37ecfe6ae7f7b557db7712ee6988cb0397306e9 reverted:
no downtime visible via ping, it's too small.
More information about the pve-devel
mailing list