[pve-devel] [PATCH v3 qemu-server] Fix ACPI-suspended VMs resuming after migration
t.lamprecht at proxmox.com
Tue Oct 10 13:45:44 CEST 2023
Am 09/10/2023 um 15:25 schrieb Filip Schauer:
> Add checks for "suspended" and "prelaunch" runstates when checking
> whether a VM is paused.
> This fixes the following issues:
> * ACPI-suspended VMs automatically resuming after migration
> * Shutdown and reboot commands timing out instead of failing
> immediately on suspended VMs
I checked the call-sites and what I'm wondering is, can the VM from those
new states get waked up without QMP intervention, say a ACPI-suspension
be triggered by some (virtual) RTC or via network (like wake-on-lan),
as then, we should add a big notice comment on this method to ensure
new users of it are informed about that possibility.
Also, with that change we might have added a race for suspend-mode backups,
at least if VMs really can wake up without a QMP command (which I find likely).
I.e., between the time we checked and set vm_was_paused until we actually
suspend, because if the VM would wake up in between we might get inconsistent
stuff and skip things like fs-freeze.
While we recommend stop and snapshot modes over suspend mode, the latter is
still exposed via UI and API and should work OK enough.
Note that ACPI S3 suspend and our vm_suspend are very different things, our
vm_suspend is doing a "stop" monitor command, resulting in all vCPUs being
stopped, while S3 is a (virtual) firmware/machine feature – I mean, I guess
there's a reason that those things are reported as different states..
It doesn't help that our vm_suspend method doesn't actually do a (ACPI S3
like) suspension, but a (vCPU) "stop".
The "prelaunch" state, OTOH., seems pretty much like "paused", with the
difference that the VM vCPUs never ran in the former, this seems fine to
handle the same. But for "suspended" I'm not sure if we'd like to detect
that as paused only if conflating both is OK for a call-site, so maybe
with an opt-in parameter. Alternatively we could return the status in
the "true" case so that call-sites can decide what to do for any such
special handling without an extra parameter.
More information about the pve-devel