[pve-devel] [PATCH qemu] add fix for crash during live migration in combination with block flush
Fiona Ebner
f.ebner at proxmox.com
Thu Jan 16 11:30:03 CET 2025
Am 15.01.25 um 17:28 schrieb Thomas Lamprecht:
> Am 08.01.25 um 14:03 schrieb Fiona Ebner:
>> Setting blk->root is a graph change operation and thus needs to be
>> protected by the block graph write lock in blk_remove_bs(). The
>> assignment to blk->root in blk_insert_bs() is already protected by
>> the block graph write lock.
>>
>> In particular, the graph read lock in blk_co_do_flush() could
>> previously not ensure that blk_bs(blk) would always return the same
>> value during the locked section, which could lead to a segfault [0] in
>> combination with migration [1].
>>
>> From the user-provided backtraces in the forum thread [1], it seems
>> like blk_co_do_flush() managed to get past the
>> blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
>> non-NULL value during the check, but then, when calling
>> bdrv_co_flush(), blk_bs(blk) returned NULL.
>>
>> [0]:
>>
>>> 0 bdrv_primary_child (bs=bs at entry=0x0) at ../block.c:8287
>>> 1 bdrv_co_flush (bs=0x0) at ../block/io.c:2948
>>> 2 bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901
>>
>> [1]: https://forum.proxmox.com/threads/158072
>>
>> Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
>> ---
>>
>> Upstream submission of the same patch:
>> https://lore.kernel.org/qemu-devel/20250108124649.333668-1-f.ebner@proxmox.com/T/
>
> I only skimmed the upstream discussion, but seems that there is still some
> issue left; so should I wait this version out?
Yes, we should at least also put the "root = blk->root;" assignment into
the write lock section like the upstream maintainer suggested.
That more complete change is in the package provided to the forum user.
The change should still be an improvement over the status quo, however,
the user reported that it didn't help with the specific crash. I don't
see other code paths that would fit the provided backtraces right now :/
I'll ask the user to try again with a more complete GDB script in the
hope of discovering something I missed.
More information about the pve-devel
mailing list