[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
Alexandre DERUMIER
aderumier at odiso.com
Fri Jul 28 11:21:29 CEST 2017
>>I wonder wether reusing (/extending) the existing SSH tunnel for the
>>commands we run on the target node might reduce the overhead as well?
>>for cleanup in error cases opening a new connection is probably still
>>advisable.
yes maybe. Don't known if the time is to fork the qm process, or established the ssh tunnel or get response. I'll try to add timer on this.
another idea, why not use https api call through pveproxy directly ?
I have verified with qmp status,
without pvesr call , around 20ms
2017-07-28 10:24:45,184 -- VM status: paused (inmigrate)
2017-07-28 10:24:45,208 -- VM status: running
with pvesr call , around 4s
2017-07-28 10:38:28,711 -- VM status: paused (inmigrate)
2017-07-28 10:38:28,745 -- VM status: paused
2017-07-28 10:38:28,799 -- VM status: paused
2017-07-28 10:38:28,818 -- VM status: paused
2017-07-28 10:38:28,837 -- VM status: paused
....
2017-07-28 10:38:33,912 -- VM status: running
----- Mail original -----
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Vendredi 28 Juillet 2017 10:46:55
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
On Fri, Jul 28, 2017 at 10:09:55AM +0200, Alexandre DERUMIER wrote:
>
> I have added some timer and done a migration without storage replication
>
> ->main migration loop : 150ms increase. (it's lower if I put a usleep of 1ms)
>
> 2017-07-28 10:00:10 transfer_replication_state: 1.436832
> 2017-07-28 10:00:10 move config: 0.001174
> 2017-07-28 10:00:10 switch_replication_job_target: 0.003125
> 2017-07-28 10:00:12 qm resume: 1.634583 -> (this is the time from source, to get the response, not sure how many time it take exactly on remote)
I guess only marginally less on the target until the VM is actually
resumed.
>
> seem to be transfer_replication_state which call
> my $cmd = [ @{$self->{rem_ssh}}, 'pvesr', 'set-state', $self->{vmid}, $state];
>
>
> I think calling remote qm command take some time to get response.
> Note that I don't use pvesr, so I think we should bypass theses commands if not needed.
>
yes, checking whether a state / job exists earlier on, and only
transferring state and switching the direction conditionally if needed
would be an improvement for sure.
I wonder wether reusing (/extending) the existing SSH tunnel for the
commands we run on the target node might reduce the overhead as well?
for cleanup in error cases opening a new connection is probably still
advisable.
those two improvements might get us into the <1s range again, without
sacrificing consistency on the way.
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list