[pve-devel] [PATCH-SERIES qemu 0/6] async snapshot improvements
Fiona Ebner
f.ebner at proxmox.com
Mon Mar 31 16:55:01 CEST 2025
Most importantly, start using a dedicated IO thread for the state
file when doing a live snapshot.
Having the state file be in the iohandler context means that a
blk_drain_all() call in the main thread or vCPU thread that happens
while the snapshot is running will result in a deadlock.
This change should also help in general to reduce load on the main
thread and for it to get stuck on IO, i.e. same benefits as using a
dedicated IO thread for regular drives. This is particularly
interesting when the VM state storage is a network storage like NFS.
With some luck, it could also help with bug #6262 [0]. The failure
there happens while issuing/right after the savevm-start QMP command,
so the most likely coroutine is the process_savevm_co() that was
previously scheduled to the iohandler context. Likely someone polls
the iohandler context and wants to enter the already scheduled
coroutine leading to the abort():
> qemu_aio_coroutine_enter: Co-routine was already scheduled in 'aio_co_schedule'
With a dedicated iothread, there hopefully is no such race.
Additionally, fix up some edge cases in error handling and setting the
state of the snapshot operation.
[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=6262
Fiona Ebner (6):
savevm-async: improve setting state of snapshot operation in
savevm-end handler
savevm-async: rename saved_vm_running to vm_needs_start
savevm-async: improve runstate preservation
savevm-async: cleanup error handling in savevm_start
savevm-async: use dedicated iothread for state file
savevm-async: treat failure to set iothread context as a hard failure
migration/savevm-async.c | 119 +++++++++++++++++++++++----------------
1 file changed, 69 insertions(+), 50 deletions(-)
--
2.39.5
More information about the pve-devel
mailing list