[pve-devel] migration problems since qemu 1.3

Alexandre DERUMIER aderumier at odiso.com
Fri Dec 21 17:57:45 CET 2012


Hi Stefan,

I'll try to reproduce it, maybe qemu-devel can help too ?

I'll be offline until 26/12 (christmas).

Mery Xmas to all.

Alexandre

----- Mail original ----- 

De: "Stefan Priebe - Profihost AG" <s.priebe at profihost.ag> 
À: "Alexandre DERUMIER" <aderumier at odiso.com> 
Cc: pve-devel at pve.proxmox.com 
Envoyé: Vendredi 21 Décembre 2012 14:51:54 
Objet: Re: [pve-devel] migration problems since qemu 1.3 

Hi, 

even more news. The kvm is repsonsive again after cancelling the 
migration and waiting around 1-2 minutes. 

While these two minutes - the kvm process on the source host is then 
running at 100% CPU. 

Greets, 
Stefan 
Am 21.12.2012 14:46, schrieb Stefan Priebe - Profihost AG: 
> 
> This time it hangs at the first query-migrate: 
> ------------------------------------------ 
> Dec 21 14:44:43 starting migration of VM 100 to node 'cloud1-1203' 
> (10.255.0.22) 
> Dec 21 14:44:43 copying disk images 
> Dec 21 14:44:43 starting VM 100 on remote node 'cloud1-1203' 
> Dec 21 14:44:46 starting migration tunnel 
> Dec 21 14:44:46 starting online/live migration on port 60000 
> Dec 21 14:44:46 migrate-set-capabilities, capabilities => [HASH(0x3933ed0)] 
> Dec 21 14:44:46 migrate-set-cache-size, value => 429496729 
> Dec 21 14:44:46 start migrate tcp:localhost:60000 
> Dec 21 14:44:48 query-migrate 
> ------------------------------------------- 
> 
> I can reproduce this by assign min. 4GB of memory to a machine and then 
> fill the buffers and cache by: 
> 
> find / -type f -print |xargs cat >/dev/null 
> 
> And then start a migrate. 
> 
> Stefan 
> Am 21.12.2012 11:43, schrieb Stefan Priebe - Profihost AG: 
>> Hi Alexandre, 
>> 
>> i've added some debugging / logging code. 
>> 
>> The output stops / hangs at query migrate. See here: 
>> 
>> Dec 21 11:41:59 starting migration of VM 100 to node 'cloud1-1203' 
>> (10.255.0.22) 
>> Dec 21 11:41:59 copying disk images 
>> Dec 21 11:41:59 starting VM 100 on remote node 'cloud1-1203' 
>> Dec 21 11:42:02 starting migration tunnel 
>> Dec 21 11:42:03 starting online/live migration on port 60000 
>> Dec 21 11:42:03 migrate-set-capabilities, capabilities => 
>> [HASH(0x39a9fb0)] 
>> Dec 21 11:42:03 migrate-set-cache-size, value => 429496729 
>> Dec 21 11:42:03 start migrate tcp:localhost:60000 
>> Dec 21 11:42:05 query-migrate 
>> Dec 21 11:42:05 migration status: active (transferred 468063329, 
>> remaining 3764068352), total 4303814656) 
>> Dec 21 11:42:07 query-migrate 
>> 
>> I can't even ping the VM anymore. 
>> 
>> Stefan 
>> 
>> Am 21.12.2012 08:58, schrieb Alexandre DERUMIER: 
>>> Hi Stefan, any news ? 
>>> 
>>> I'm trying to reproduce your problem, but it's works fine for me, no 
>>> crash... 
>>> 
>>> ----- Mail original ----- 
>>> 
>>> De: "Stefan Priebe - Profihost AG" <s.priebe at profihost.ag> 
>>> À: "Alexandre DERUMIER" <aderumier at odiso.com> 
>>> Cc: pve-devel at pve.proxmox.com 
>>> Envoyé: Jeudi 20 Décembre 2012 16:09:42 
>>> Objet: Re: [pve-devel] migration problems since qemu 1.3 
>>> 
>>> Hi, 
>>> Am 20.12.2012 15:57, schrieb Alexandre DERUMIER: 
>>>> Just an idea (not sure it's the problem),can you try to commment 
>>>> 
>>>> $qmpclient->queue_cmd($vmid, $ballooncb, 'query-balloon'); 
>>>> 
>>>> in QemuServer.pm, line 2081. 
>>>> 
>>>> and restart pvedaemon && pvestatd ? 
>>> 
>>> This doesn't change anything. 
>>> 
>>> Right now the kvm process is running on old and new machine. 
>>> 
>>> An strace on the pid on the new machine shows a loop of: 
>>> 
>>> ---------------- 
>>> [pid 28351] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed 
>>> out) 
>>> [pid 28351] futex(0x7ff8b8025388, FUTEX_WAKE_PRIVATE, 1) = 0 
>>> [pid 28351] futex(0x7ff8b8026024, 
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11801, {1356016143, 
>>> 843092000}, ffffffff <unfinished ...> 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160378880, 160411648, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160411648, 160448512, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160448512, 160481280, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160481280, 160514048, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160514048, 160546816, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160546816, 160583680, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160583680, 160616448, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160616448, 160649216, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160649216, 160681984, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160681984, 160718848, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160718848, 160751616, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160751616, 160784384, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160784384, 160817152, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160817152, 160854016, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160854016, 160886784, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160886784, 160919552, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160919552, 160952320, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160952320, 160989184, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 160989184, 161021952, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161021952, 161054720, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161054720, 161087488, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161087488, 161124352, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161124352, 161157120, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161157120, 161189888, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161189888, 161222656, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161222656, 161259520, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161259520, 161292288, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161292288, 161325056, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28351] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed 
>>> out) 
>>> [pid 28351] futex(0x7ff8b8025388, FUTEX_WAKE_PRIVATE, 1) = 0 
>>> [pid 28351] futex(0x7ff8b8026024, 
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11803, {1356016144, 
>>> 843283000}, ffffffff <unfinished ...> 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161325056, 161357824, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161357824, 161394688, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161394688, 161427456, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161427456, 161460224, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28345] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection 
>>> timed out) 
>>> [pid 28345] futex(0x7ff8caa2e274, FUTEX_CMP_REQUEUE_PRIVATE, 1, 
>>> 2147483647, 0x7ff8caa2e1b0, 872) = 1 
>>> [pid 28347] <... futex resumed> ) = 0 
>>> [pid 28345] futex(0x7ff8caa241a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 
>>> [pid 28347] futex(0x7ff8caa2e1b0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 
>>> [pid 28345] <... futex resumed> ) = 0 
>>> [pid 28347] <... futex resumed> ) = 0 
>>> [pid 28345] futex(0x7ff8caa2420c, 
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 799, {1356016153, 
>>> 954319000}, ffffffff <unfinished ...> 
>>> [pid 28347] sendmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\t", 1}], 
>>> msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 1 
>>> [pid 28347] futex(0x7ff8caa2e274, FUTEX_WAIT_PRIVATE, 873, NULL 
>>> <unfinished ...> 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161460224, 161492992, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161492992, 161529856, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161529856, 161562624, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161562624, 161595392, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161595392, 161628160, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161628160, 161665024, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161665024, 161697792, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161697792, 161730560, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161730560, 161763328, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161763328, 161800192, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161800192, 161832960, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161832960, 161865728, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161865728, 161898496, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> [pid 28285] mremap(0x7ff77bfe4000, 161898496, 161935360, MREMAP_MAYMOVE) 
>>> = 0x7ff77bfe4000 
>>> ----------------------- 
>>> 
>>> 
>>> Stefan 
>>> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 



More information about the pve-devel mailing list