[pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Thu Jul 27 16:30:37 CEST 2017

looking at user migration log:

Jul 24 18:12:37 start migrate command to unix:/run/qemu-server/100.migrate
Jul 24 18:12:39 migration speed: 256.00 MB/s - downtime 39 ms

Seem that the vm have very low memory, as migration take 2second between the begin and the end.
so maybe the usleep lowering is not working here.

----- Mail original -----
De: "aderumier" <aderumier at odiso.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Jeudi 27 Juillet 2017 16:08:35
Objet: Re: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4)

Thanks for the explain Fabian. (I'm always using migration insecure, so I didn't notice this bug) 

>>when live-migrating over a unix socket, PVE 5 takes up to a few seconds 
>>between completing the RAM transfer and pausing the source VM, and 
>>resuming the target VM. in PVE 4, the same migration has a downtime of 
>>almost 0. 

few seconds seem so huge ... (user talk about 4s).... 

>>AFAICT, the reason for this is a bug fix in PVE 5's qemu-server which 
>>was required to support storage live migration in Qemu 2.9. 

any commit reference ? 

>>originally in PVE 4, the target VM in a live migration was started in 
>>incoming migration mode and NOT continued on startup (whereas VMs rolled 
>>back to a RAM snapshot where started in the same mode, but immediately 
>>continued). 

>>in June 2016[3], migration over ssh-forwarded unix sockets was 
>>implemented. the check for skipping the continue command on startup of 
>>the target VM was overlooked, so now VMs migrated over unix sockets were 
>>started in incoming migration mode, but continued on startup. 

But this seem to be a bug, fixed later here ? 

https://git.proxmox.com/?p=qemu-server.git;a=commit;h=b37ecfe6ae7f7b557db7712ee6988cb0397306e9 

>>I wonder whether going the "immediately cont" route for live migrations 
>>without local storage can cause any issues besides the obvious "moving 
>>the conf file failed and VM is now active on the wrong node" one? 

I don't known if it could be great to have some kind of temporary conf file where a kvm process is running. 
(here we could see vm on source host with state running, and vm on source target with state migrating for example). 
Like this if something bad happen at the end of migration, user could still stop the target kvm process with gui. 

But maybe it's too complex to implement, don't known... 

>>if not, I propose doing just that. otherwise, we could think about lowering 
>>the polling interval when waiting for RAM migration to complete (in 
>>phase2) - that should shave off a bit of the downtime as well. 

I wonder where exactly it take so much time.. 
$downtime seem to be low, but as it's coming from status, maybe are we missing some query migrate . 
Also I think we already try to lowering usleep at the end 
#reduce sleep if remainig memory if lower than the everage transfert 
$usleep = 300000 if $avglstat && $rem < $avglstat; 

maybe this don't work correctly ? 

I think a proper way could be catch qemu events, instead pooling status. (but require maybe lot of work) 

----- Mail original ----- 
De: "Fabian Grünbichler" <f.gruenbichler at proxmox.com> 
À: "pve-devel" <pve-devel at pve.proxmox.com> 
Envoyé: Jeudi 27 Juillet 2017 14:45:43 
Objet: [pve-devel] Bug 1458 - PVE 5 live migration downtime degraded to several seconds (compared to PVE 4) 

the following issue was reported on the forum[1] and as bug #1458[2], 
moving this here for further discussion of potential fixes. 

when live-migrating over a unix socket, PVE 5 takes up to a few seconds 
between completing the RAM transfer and pausing the source VM, and 
resuming the target VM. in PVE 4, the same migration has a downtime of 
almost 0. 

AFAICT, the reason for this is a bug fix in PVE 5's qemu-server which 
was required to support storage live migration in Qemu 2.9. 

originally in PVE 4, the target VM in a live migration was started in 
incoming migration mode and NOT continued on startup (whereas VMs rolled 
back to a RAM snapshot where started in the same mode, but immediately 
continued). 

in June 2016[3], migration over ssh-forwarded unix sockets was 
implemented. the check for skipping the continue command on startup of 
the target VM was overlooked, so now VMs migrated over unix sockets were 
started in incoming migration mode, but continued on startup. this does 
not change the behaviour on startup, as a VM in incoming migration mode 
is not actually running until a migration has happened. this does mean 
that the downtime is vastly reduced for such migrations, as Qemu will 
continue the target VM automatically as soon as the migration job is 
completed. 

the only things that happen after this automatic resume is 
- finish tunnel 
- moving the conf file logically between nodes 
- resuming on the target side (which is a no-op in this case) 

so the risk for inconsistencies seems pretty small. 

later on, we introduced live-storage migration. in those cases, we now 
have the following scenario: 
- start storage migration jobs 
- start RAM migration 
- wait for RAM to be completed 
- finish tunnel 
- finish block jobs 
- update conf file 
- move the conf file logically between the nodes 
- resume on target node 

so depending on whether the migration goes over tcp (OK) or over unix 
(not so much) we have very different behaviour and risk for 
inconsistencies. 

with the introduction to PVE 5, this different behaviour was fixed / 
made consistent, by adapting the "manual resume" stance. this was needed 
because Qemu 2.9 does not allow the storage migration over NBD and the 
target VM itself to have write access to the same disks at the same 
time. this fix was not backported to PVE 4, which means that storage 
live-migration is potentially buggy there, but live-migration over unix 
sockets is faster. 

I wonder whether going the "immediately cont" route for live migrations 
without local storage can cause any issues besides the obvious "moving 
the conf file failed and VM is now active on the wrong node" one? if 
not, I propose doing just that. otherwise, we could think about lowering 
the polling interval when waiting for RAM migration to complete (in 
phase2) - that should shave off a bit of the downtime as well. 

in any case, I think we need to backport the manual resume in case of 
local storage live migration fix to PVE 4. 

1: https://forum.proxmox.com/threads/pve-5-live-migration-downtime-degradation-2-4-sec.35890 
2: https://bugzilla.proxmox.com/show_bug.cgi?id=1458 
3: 1c9d54bfd05e0d017a6e2ac5524d75466b1a4455 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel