[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Alexandre DERUMIER aderumier at odiso.com
Fri Jul 28 14:43:51 CEST 2017


PVE/Tools.pm
---------------
>>  use PVE::Syscall -> 500ms (    require("syscall.ph");

Sorry, this one is incorrect.
The time seem to be taken by
   use Net::DBus qw(dbus_uint32 dbus_uint64);

Note that it's on old xeon.
on a recent fast server, It's 3x faster.



----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 14:03:41
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

>>#time pvesr 
>>real 0m1.098s 
>> 
>>(same for qm or other command) 

poorman profiling 
----------------- 

PVE/CLI/pvesr.pm 
---------------- 

use PVE::RPCEnvironment; --> 800ms are taked it 

/PVE/RPCEnvironment.pm 
----------------------- 
use PVE::AccessControl ->> 300ms 

PVE/AccessControl.pm 
-------------------- 
use PVE::OTP; -> 300ms 

PVE/OTP.pm 
---------- 
use PVE::Tools -> 300ms 


use base qw(PVE::RESTEnvironment); ->500ms 

PVE/RESTEnvironment.pm 
----------------------- 
use PVE::ProcFSTools; -> 500ms 



PVE/Tools.pm 
--------------- 
use PVE::Syscall -> 500ms ( require("syscall.ph"); 


Note sure if we can optimise that. 

But I think calling remote request through pveproxy should be faster, as all is already loaded. 


----- Mail original ----- 
De: "aderumier" <aderumier at odiso.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Vendredi 28 Juillet 2017 13:22:31 
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4) 

pvesr through ssh 
----------------- 
root at kvmtest1 ~ # time /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=kvmtest2' root at 10.3.94.47 pvesr set-state 244 \''{}'\' 

real 0m1.399s 


locally 
-------- 
root at kvmtest2:~# time pvesr set-state 244 {} 
real 0m1.137s 


so 40ms for ssh, and 1,137s for pvesr itself. 

(I think we could simply skip call if state if empty, but reusing ssh could help too a little bit) 


also , a simple 

#time pvesr 
real 0m1.098s 

(same for qm or other command) 




>>that does not make sense - are you sure you haven't removed anything 
>>else? qemu does not know or care about pvesr, so why should it resume 
>>automatically? 

no it's not resume automatically. This is the log of an external script, calling qmp status in loop 
to see how much time it's really paused. 
removing pvesr in phase3, reduce the pause time (between the end of phase2 and qm resume). 





----- Mail original ----- 
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Vendredi 28 Juillet 2017 12:33:19 
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4) 

On Fri, Jul 28, 2017 at 11:21:29AM +0200, Alexandre DERUMIER wrote: 
> >>I wonder wether reusing (/extending) the existing SSH tunnel for the 
> >>commands we run on the target node might reduce the overhead as well? 
> >>for cleanup in error cases opening a new connection is probably still 
> >>advisable. 
> 
> yes maybe. Don't known if the time is to fork the qm process, or established the ssh tunnel or get response. I'll try to add timer on this. 

establishing an SSH connection takes about 1s here, so that would be 2s 
for both commands over SSH 

qm resume takes ~0.3, so less than that until the VM is active again 

> 
> another idea, why not use https api call through pveproxy directly ? 

that would take a while as well, reusing the already open SSH connection 
would be faster for sure. 

> 
> I have verified with qmp status, 
> 
> without pvesr call , around 20ms 
> 
> 2017-07-28 10:24:45,184 -- VM status: paused (inmigrate) 
> 2017-07-28 10:24:45,208 -- VM status: running 
> 
> 
> with pvesr call , around 4s 
> 
> 2017-07-28 10:38:28,711 -- VM status: paused (inmigrate) 
> 2017-07-28 10:38:28,745 -- VM status: paused 
> 2017-07-28 10:38:28,799 -- VM status: paused 
> 2017-07-28 10:38:28,818 -- VM status: paused 
> 2017-07-28 10:38:28,837 -- VM status: paused 
> .... 
> 2017-07-28 10:38:33,912 -- VM status: running 

that does not make sense - are you sure you haven't removed anything 
else? qemu does not know or care about pvesr, so why should it resume 
automatically? 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list