[pve-devel] new migration patches in qemu.git

Alexandre DERUMIER aderumier at odiso.com
Thu Dec 27 04:14:07 CET 2012


>>I see that last migration code from qemu git (1.4), seem to improve a lot the downtime (from 500 -> 30ms) with high memory change workload. 

This is this commit

http://git.qemu.org/?p=qemu.git;a=commit;h=bb5801f551ee8591d576d87a9290af297998e322

The changes are huge, but maybe can we put them in pve-qemu-kvm package ?(Dietmar ? any opinion ?)

I'll do benchmark today with/without them with vm running video.


from qemu mailing (check the downtime):

"Hi

Changes for yesterday:
- Paolo Acked the series
- Rebaso on top of today git (only conflicts were due to header re-shuffle)

Please pull.

[20121219]

This is my queue for migration-thread and patches associated.  This
integrates review comments & code for Paolo.  This is the subset from
both approachs that we agreed with. rest of patches need more review
and are not here.

Migrating and idle guest with upstwream:

(qemu) info migrate
capabilities: xbzrle: off
Migration status: completed
total time: 34251 milliseconds
downtime: 492 milliseconds
transferred ram: 762458 kbytes
remaining ram: 0 kbytes
total ram: 14688768 kbytes
duplicate: 3492606 pages
normal: 189762 pages
normal bytes: 759048 kbytes

with this series of patches.

(qemu) info migrate
capabilities: xbzrle: off
Migration status: completed
total time: 30712 milliseconds
downtime: 29 milliseconds
transferred ram: 738857 kbytes
remaining ram: 0 kbytes
total ram: 14688768 kbytes
duplicate: 3503423 pages
normal: 176671 pages
normal bytes: 706684 kbytes

Notice the big difference in downtime.  And that is also seen inside
the guest a program that just do an idle loop seeing how "long" it
takes to wait for 10ms.

with upstream:

[root at d1 ~]# ./timer
delay of 452 ms
delay of 114 ms
delay of 136 ms
delay of 135 ms
delay of 136 ms
delay of 131 ms
delay of 134 ms

with this series of patches, wait never takes 100ms, nothing is printed.

Please pull.

Thanks, Juan."

----- Mail original ----- 

De: "Alexandre DERUMIER" <aderumier at odiso.com> 
À: "Stefan Priebe" <s.priebe at profihost.ag> 
Cc: pve-devel at pve.proxmox.com 
Envoyé: Jeudi 27 Décembre 2012 02:48:17 
Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 

>>It also isn't accepted you get the answer back that 1 isn't a number. 
>>Don't know what format a number needs? 

the default migrate_downtime is 30ms by default (if we doesn't send qmp command). 
I think we set 1 sec by default, because of infinite migration (30ms was too short in past with high memory change workload). 
I see that last migration code from qemu git (1.4), seem to improve a lot the downtime (from 500 -> 30ms) with high memory change workload. 
Don't know if qemu 1.3 works fine without setting downtime to 1sec. 

I think we need to cast the value as int for the json 

vm_mon_cmd($vmid, "migrate_set_downtime", value => $migrate_downtime); 

-> 

vm_mon_cmd($vmid, "migrate_set_downtime", value => int($migrate_downtime)); 


I remember same problem with qemu_block_set_io_throttle() 
vm_mon_cmd($vmid, "block_set_io_throttle", device => $deviceid, bps => int($bps), bps_rd => int($bps_rd), bps_wr => int($bps_wr), iops => int($iops), iops_rd => int($iops_rd), iops_wr => int($iops_wr)); 

So maybe does it send crap if the value is not casted ? 


Also the value should not be int but float, qmp doc said that we can use 0.5, 0.30, as value. 



also query-migrate returns some new 2 cools values about downtime, I think we should display them in query migrate log 

- "downtime": only present when migration has finished correctly 
total amount in ms for downtime that happened (json-int) 
- "expected-downtime": only present while migration is active 
total amount in ms for downtime that was calculated on 
the last bitmap round (json-int) 

----- Mail original ----- 

De: "Stefan Priebe" <s.priebe at profihost.ag> 
À: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: pve-devel at pve.proxmox.com 
Envoyé: Mercredi 26 Décembre 2012 20:52:56 
Objet: Re: [pve-devel] Baloon Device is the problem! Re: migration problems since qemu 1.3 

Hi, 

Am 26.12.2012 17:40, schrieb Alexandre DERUMIER: 
> I don't know if we really need a default value, because it's always setting migrate_downtime to 1. 
It also isn't accepted you get the answer back that 1 isn't a number. 
Don't know what format a number needs? 

> Now, I don't know what really happen to you, because recent changes can set migrate_downtime to the target vm (vm_mon_cmd_nocheck) 
> But I don't think it's doing something because the migrate_downtime should be done one sourcevm. 
You get the error message that 1 isn't a number. If i get this message 
migration fails after. 

> Can you try to replace vm_mon_cmd_nocheck by vm_mon_cmd ? (So it should works only at vm_start but not when live migrate occur on target vm) 
Done - works see my other post. 

Stefan 
_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list