[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Fri Jul 28 12:33:19 CEST 2017

On Fri, Jul 28, 2017 at 11:21:29AM +0200, Alexandre DERUMIER wrote:
> >>I wonder wether reusing (/extending) the existing SSH tunnel for the 
> >>commands we run on the target node might reduce the overhead as well? 
> >>for cleanup in error cases opening a new connection is probably still 
> >>advisable. 
> 
> yes maybe. Don't known if the time is to fork the qm process, or established the ssh tunnel or get response. I'll try to add timer on this.

establishing an SSH connection takes about 1s here, so that would be 2s
for both commands over SSH

qm resume takes ~0.3, so less than that until the VM is active again

> 
> another idea, why not use https api call through pveproxy directly ? 

that would take a while as well, reusing the already open SSH connection
would be faster for sure.

> 
> I have verified with qmp status,
> 
> without pvesr call , around 20ms
> 
> 2017-07-28 10:24:45,184 -- VM status: paused (inmigrate)
> 2017-07-28 10:24:45,208 -- VM status: running
> 
> 
> with pvesr call , around 4s
> 
> 2017-07-28 10:38:28,711 -- VM status: paused (inmigrate)
> 2017-07-28 10:38:28,745 -- VM status: paused
> 2017-07-28 10:38:28,799 -- VM status: paused
> 2017-07-28 10:38:28,818 -- VM status: paused
> 2017-07-28 10:38:28,837 -- VM status: paused
> ....
> 2017-07-28 10:38:33,912 -- VM status: running

that does not make sense - are you sure you haven't removed anything
else? qemu does not know or care about pvesr, so why should it resume
automatically?