[pve-devel] [PATCH qemu-server 1/2] Fix #1441: Do not unplug controllers when the mirroring is finished

Thomas Lamprecht t.lamprecht at proxmox.com
Thu Aug 24 08:44:06 CEST 2017


On 08/24/2017 08:30 AM, Fabian Gr├╝nbichler wrote:
> On Thu, Aug 24, 2017 at 07:05:35AM +0200, Thomas Lamprecht wrote:
>> On 08/23/2017 07:15 PM, Alexandre DERUMIER wrote:
>>> for me, this patch is ok.
>>>
>>> if the job is complete, we don't need to unplug. (and that mean that vm need to support unplug too, we can't be sure that guest support this).
>>>
>>> Is it possible that the problem was the previous bug with unix socket, were the target vm was not paused after migration ?
>>>
>>
>> Could be possible.
>>
>> Looking at the qemu-server and qemu source Emmanuel's patch seems, in fact, valid.
>>
>> All mirror block jobs are finished at this point, we do a "nbd-server-stop" on,
>> the target which disconnects all NBD clients, as everything from the source side
>> using NBD is mirrored, flushed and happy this should not run into a problem there.
>>
>> Same for qemu's do_vm_stop, which drains and flushes all remaining blockjobs (which
>> there shouldn't be any a this point, else we would have mirrored/migrated an unclean
>> state?!)
>>
>> I'd still wait for Wolfgang B.'s opinion on this, he knows far more on the topic.
> 
> I originally added the device_del because I experienced long delays in
> stopping the source VM, but I did not debug further (and unfortunately
> cannot remember more details). if such delays are not reproducible
> (anymore) when taking out that foreach, I am OK with it.
> 
> I think my tests were with -rc5, so it is possible something changed on
> the Qemu side as well..
> 

Could it be -rc4? because rc5 is equivalent with the final release.

And between rc4 and rc5 this was fixed:

commit 91af091f92358c2ff828fa1def1a7bea9b701cdf
Author: Fam Zheng <famz at redhat.com>
Date:   Tue Apr 18 22:30:44 2017 +0800

     block: Drain BH in bdrv_drained_begin
     
     [...]
     As a side effect this fixes a hang in block_job_detach_aio_context
     during system_reset when a block job is ready.

Could be a candidate which fixed this regression...




More information about the pve-devel mailing list