[pve-devel] [PATCH qemu-server 1/1] snapshot: prohibit snapshot with ram if vm has a passthrough pci device
Fiona Ebner
f.ebner at proxmox.com
Tue May 14 15:03:51 CEST 2024
Am 12.04.24 um 11:32 schrieb Fabian Grünbichler:
> On March 19, 2024 4:08 pm, Hannes Duerr wrote:
>> When a snapshot is created with RAM, qemu attempts to save not only the
>> RAM content, but also the internal state of the PCI devices.
>>
>> However, as not all drivers support this, this can lead to the device
>> drivers in the VM not being able to handle the saved state during the
>> restore/rollback and in conclusion the VM might crash. For this reason,
>> we now generally prohibit snapshots with RAM for VMs with passthrough
>> devices.
>>
>> In the future, this prohibition can of course be relaxed for individual
>> drivers that we know support it, such as the vfio driver
>>
We're already using pci-vfio, see [0]. So not sure how that relaxation
would look like. Probably it'd need to be a flag for the hostpci
property similar to what's done in Dominik's "implement experimental
vgpu live migration" series for mapped devices.
That said, looking into this and wondering why QEMU doesn't check it,
there's an issue in that our savevm-async code does not properly check
for all migration blockers (only some of them)! I'll work out a patch
for that. If we can be sure not to break any existing users with the
below code, we can still apply it too of course.
>> Signed-off-by: Hannes Duerr <h.duerr at proxmox.com>
>> ---
>> PVE/API2/Qemu.pm | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> index 40b6c30..0acd1c7 100644
>> --- a/PVE/API2/Qemu.pm
>> +++ b/PVE/API2/Qemu.pm
>> @@ -5101,6 +5101,16 @@ __PACKAGE__->register_method({
>> die "unable to use snapshot name 'pending' (reserved name)\n"
>> if lc($snapname) eq 'pending';
>>
>> + if ($param->{vmstate}) {
>> + my $conf = PVE::QemuConfig->load_config($vmid);
>> +
>> + for my $key (keys %$conf) {
>> + next if $key !~ /^hostpci\d+/;
>> + die "cannot snapshot VM with RAM due to passed-through PCI device(s), which lack"
>> + ." the possibility to save/restore their internal state\n";
>> + }
>> + }
>
> isn't the same also true of other local resources (e.g., passed-through
> USB?)?
>
> maybe we could find a way to unify the checks we do for live migration
> (PVE::QemuServer::check_local_resources), since that is almost the same
> code inside Qemu as a stateful snapshot+rollback?
>
> (not opposed to applying this before that happens though, just a
> question in general..)
>
Similarly, there is the suspend API endpoint that could benefit from
having a single helper. I assume this code was copied from there.
[0]:
https://git.proxmox.com/?p=qemu-server.git;a=blob;f=PVE/QemuServer/PCI.pm;h=1673041bbe7a5d638a0ee9c56ea6bbb31027023b;hb=HEAD#l625
More information about the pve-devel
mailing list