[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)
Fabian Grünbichler
f.gruenbichler at proxmox.com
Fri Jul 28 10:46:55 CEST 2017
On Fri, Jul 28, 2017 at 10:09:55AM +0200, Alexandre DERUMIER wrote:
>
> I have added some timer and done a migration without storage replication
>
> ->main migration loop : 150ms increase. (it's lower if I put a usleep of 1ms)
>
> 2017-07-28 10:00:10 transfer_replication_state: 1.436832
> 2017-07-28 10:00:10 move config: 0.001174
> 2017-07-28 10:00:10 switch_replication_job_target: 0.003125
> 2017-07-28 10:00:12 qm resume: 1.634583 -> (this is the time from source, to get the response, not sure how many time it take exactly on remote)
I guess only marginally less on the target until the VM is actually
resumed.
>
> seem to be transfer_replication_state which call
> my $cmd = [ @{$self->{rem_ssh}}, 'pvesr', 'set-state', $self->{vmid}, $state];
>
>
> I think calling remote qm command take some time to get response.
> Note that I don't use pvesr, so I think we should bypass theses commands if not needed.
>
yes, checking whether a state / job exists earlier on, and only
transferring state and switching the direction conditionally if needed
would be an improvement for sure.
I wonder wether reusing (/extending) the existing SSH tunnel for the
commands we run on the target node might reduce the overhead as well?
for cleanup in error cases opening a new connection is probably still
advisable.
those two improvements might get us into the <1s range again, without
sacrificing consistency on the way.
More information about the pve-devel
mailing list