[pve-devel] migration problems since qemu 1.3

Stefan Priebe - Profihost AG s.priebe at profihost.ag
Fri Dec 21 14:48:25 CET 2012


The kvm process on the source host is then running at 100% CPU in an 
endless loop.

Stefan
Am 21.12.2012 14:46, schrieb Stefan Priebe - Profihost AG:
>
> This time it hangs at the first query-migrate:
> ------------------------------------------
> Dec 21 14:44:43 starting migration of VM 100 to node 'cloud1-1203'
> (10.255.0.22)
> Dec 21 14:44:43 copying disk images
> Dec 21 14:44:43 starting VM 100 on remote node 'cloud1-1203'
> Dec 21 14:44:46 starting migration tunnel
> Dec 21 14:44:46 starting online/live migration on port 60000
> Dec 21 14:44:46 migrate-set-capabilities, capabilities => [HASH(0x3933ed0)]
> Dec 21 14:44:46 migrate-set-cache-size, value => 429496729
> Dec 21 14:44:46 start migrate tcp:localhost:60000
> Dec 21 14:44:48 query-migrate
> -------------------------------------------
>
> I can reproduce this by assign min. 4GB of memory to a machine and then
> fill the buffers and cache by:
>
> find / -type f -print |xargs cat >/dev/null
>
> And then start a migrate.
>
> Stefan
> Am 21.12.2012 11:43, schrieb Stefan Priebe - Profihost AG:
>> Hi Alexandre,
>>
>> i've added some debugging / logging code.
>>
>> The output stops / hangs at query migrate. See here:
>>
>> Dec 21 11:41:59 starting migration of VM 100 to node 'cloud1-1203'
>> (10.255.0.22)
>> Dec 21 11:41:59 copying disk images
>> Dec 21 11:41:59 starting VM 100 on remote node 'cloud1-1203'
>> Dec 21 11:42:02 starting migration tunnel
>> Dec 21 11:42:03 starting online/live migration on port 60000
>> Dec 21 11:42:03 migrate-set-capabilities, capabilities =>
>> [HASH(0x39a9fb0)]
>> Dec 21 11:42:03 migrate-set-cache-size, value => 429496729
>> Dec 21 11:42:03 start migrate tcp:localhost:60000
>> Dec 21 11:42:05 query-migrate
>> Dec 21 11:42:05 migration status: active (transferred 468063329,
>> remaining 3764068352), total 4303814656)
>> Dec 21 11:42:07 query-migrate
>>
>> I can't even ping the VM anymore.
>>
>> Stefan
>>
>> Am 21.12.2012 08:58, schrieb Alexandre DERUMIER:
>>> Hi Stefan, any news ?
>>>
>>> I'm trying to reproduce your problem, but it's works fine for me, no
>>> crash...
>>>
>>> ----- Mail original -----
>>>
>>> De: "Stefan Priebe - Profihost AG" <s.priebe at profihost.ag>
>>> À: "Alexandre DERUMIER" <aderumier at odiso.com>
>>> Cc: pve-devel at pve.proxmox.com
>>> Envoyé: Jeudi 20 Décembre 2012 16:09:42
>>> Objet: Re: [pve-devel] migration problems since qemu 1.3
>>>
>>> Hi,
>>> Am 20.12.2012 15:57, schrieb Alexandre DERUMIER:
>>>> Just an idea (not sure it's the problem),can you try to commment
>>>>
>>>> $qmpclient->queue_cmd($vmid, $ballooncb, 'query-balloon');
>>>>
>>>> in QemuServer.pm, line 2081.
>>>>
>>>> and restart pvedaemon && pvestatd ?
>>>
>>> This doesn't change anything.
>>>
>>> Right now the kvm process is running on old and new machine.
>>>
>>> An strace on the pid on the new machine shows a loop of:
>>>
>>> ----------------
>>> [pid 28351] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>>> out)
>>> [pid 28351] futex(0x7ff8b8025388, FUTEX_WAKE_PRIVATE, 1) = 0
>>> [pid 28351] futex(0x7ff8b8026024,
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11801, {1356016143,
>>> 843092000}, ffffffff <unfinished ...>
>>> [pid 28285] mremap(0x7ff77bfe4000, 160378880, 160411648, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160411648, 160448512, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160448512, 160481280, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160481280, 160514048, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160514048, 160546816, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160546816, 160583680, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160583680, 160616448, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160616448, 160649216, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160649216, 160681984, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160681984, 160718848, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160718848, 160751616, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160751616, 160784384, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160784384, 160817152, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160817152, 160854016, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160854016, 160886784, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160886784, 160919552, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160919552, 160952320, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160952320, 160989184, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 160989184, 161021952, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161021952, 161054720, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161054720, 161087488, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161087488, 161124352, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161124352, 161157120, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161157120, 161189888, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161189888, 161222656, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161222656, 161259520, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161259520, 161292288, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161292288, 161325056, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28351] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed
>>> out)
>>> [pid 28351] futex(0x7ff8b8025388, FUTEX_WAKE_PRIVATE, 1) = 0
>>> [pid 28351] futex(0x7ff8b8026024,
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11803, {1356016144,
>>> 843283000}, ffffffff <unfinished ...>
>>> [pid 28285] mremap(0x7ff77bfe4000, 161325056, 161357824, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161357824, 161394688, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161394688, 161427456, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161427456, 161460224, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28345] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection
>>> timed out)
>>> [pid 28345] futex(0x7ff8caa2e274, FUTEX_CMP_REQUEUE_PRIVATE, 1,
>>> 2147483647, 0x7ff8caa2e1b0, 872) = 1
>>> [pid 28347] <... futex resumed> ) = 0
>>> [pid 28345] futex(0x7ff8caa241a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>> [pid 28347] futex(0x7ff8caa2e1b0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>>> [pid 28345] <... futex resumed> ) = 0
>>> [pid 28347] <... futex resumed> ) = 0
>>> [pid 28345] futex(0x7ff8caa2420c,
>>> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 799, {1356016153,
>>> 954319000}, ffffffff <unfinished ...>
>>> [pid 28347] sendmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\t", 1}],
>>> msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 1
>>> [pid 28347] futex(0x7ff8caa2e274, FUTEX_WAIT_PRIVATE, 873, NULL
>>> <unfinished ...>
>>> [pid 28285] mremap(0x7ff77bfe4000, 161460224, 161492992, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161492992, 161529856, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161529856, 161562624, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161562624, 161595392, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161595392, 161628160, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161628160, 161665024, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161665024, 161697792, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161697792, 161730560, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161730560, 161763328, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161763328, 161800192, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161800192, 161832960, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161832960, 161865728, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161865728, 161898496, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> [pid 28285] mremap(0x7ff77bfe4000, 161898496, 161935360, MREMAP_MAYMOVE)
>>> = 0x7ff77bfe4000
>>> -----------------------
>>>
>>>
>>> Stefan
>>>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel



More information about the pve-devel mailing list