[pve-devel] [PATCH qemu-server] migrate: keep VM paused after migration if it was before

Fabian Ebner f.ebner at proxmox.com
Thu Apr 21 09:44:42 CEST 2022


Am 20.04.22 um 14:43 schrieb Fabian Grünbichler:
> On March 18, 2022 8:51 am, Fabian Ebner wrote:
>> Also cannot issue a guest agent command in that case.
>>
>> Reported in the community forum:
>> https://forum.proxmox.com/threads/106618
>>
>> Signed-off-by: Fabian Ebner <f.ebner at proxmox.com>
>> ---
>>
>> Best viewed with -w.
>>
>>  PVE/QemuMigrate.pm | 54 ++++++++++++++++++++++++++--------------------
>>  1 file changed, 31 insertions(+), 23 deletions(-)
> 
> patch looks good to me - it might make sense to restructure the 
> conditionals a bit to log that resuming/fstrim was skipped though to 
> reduce confusion (user that paused VM and user doing the migration might 
> not be the same entity after all)?
> 
> one other thing I noticed (pre-existing, but the changes here made me 
> look and my search came up short), inside phase2:
> 
> - start block job(s) without autocompletion and wait for them to 
>   converge
> - start RAM/state migration without autocompletion and wait for it to 
>   converge
> X both source and target VMs are paused now with "identical" state, 
> irrespective of the source being paused or not initially
> - cancel block job(s) (to close NBD writer(s) so that switchover can 
>   proceed in phase3_cleanup)
> 
> if something happens after X in phase2, we enter phase2_cleanup, and 
> attempt to cancel the migration

If migrate_cancel actually cancels the migration, the VM will be running
on the source node again :)

If migrate_cancel fails, resume might also fail?

There is an edge case however:
If migration actually finished, but we aborted because of e.g. too many
query-migrate failures, then migrate_cancel will succeed (because there
is no active migration) and the VM will be in post-migrate state on the
source node. Here, resume would help.

> , remove the lock, cancel the block jobs 
> again, clean up bitmaps, stop the target VM, clean up remote disks, tear 
> down the tunnel, and effectively exit the migration at that point BUT - 
> we don't handle the paused state? is there a resume source (with this 
> patch, guarded by source was not paused) missing or am I missing 
> something?
> 





More information about the pve-devel mailing list