[pve-devel] [PATCH qemu-server 1/1] snapshot: prohibit snapshot with ram if vm has a passthrough pci device

Fiona Ebner f.ebner at proxmox.com
Tue May 14 15:03:51 CEST 2024


Am 12.04.24 um 11:32 schrieb Fabian Grünbichler:
> On March 19, 2024 4:08 pm, Hannes Duerr wrote:
>> When a snapshot is created with RAM, qemu attempts to save not only the
>> RAM content, but also the internal state of the PCI devices.
>>
>> However, as not all drivers support this, this can lead to the device
>> drivers in the VM not being able to handle the saved state during the
>> restore/rollback and in conclusion the VM might crash. For this reason,
>> we now generally prohibit snapshots with RAM for VMs with passthrough
>> devices.
>>
>> In the future, this prohibition can of course be relaxed for individual
>> drivers that we know support it, such as the vfio driver
>>

We're already using pci-vfio, see [0]. So not sure how that relaxation
would look like. Probably it'd need to be a flag for the hostpci
property similar to what's done in Dominik's "implement experimental
vgpu live migration​" series for mapped devices.

That said, looking into this and wondering why QEMU doesn't check it,
there's an issue in that our savevm-async code does not properly check
for all migration blockers (only some of them)! I'll work out a patch
for that. If we can be sure not to break any existing users with the
below code, we can still apply it too of course.

>> Signed-off-by: Hannes Duerr <h.duerr at proxmox.com>
>> ---
>>  PVE/API2/Qemu.pm | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
>> index 40b6c30..0acd1c7 100644
>> --- a/PVE/API2/Qemu.pm
>> +++ b/PVE/API2/Qemu.pm
>> @@ -5101,6 +5101,16 @@ __PACKAGE__->register_method({
>>  	die "unable to use snapshot name 'pending' (reserved name)\n"
>>  	    if lc($snapname) eq 'pending';
>>  
>> +	if ($param->{vmstate}) {
>> +	    my $conf = PVE::QemuConfig->load_config($vmid);
>> +
>> +	    for my $key (keys %$conf) {
>> +		next if $key !~ /^hostpci\d+/;
>> +		die "cannot snapshot VM with RAM due to passed-through PCI device(s), which lack"
>> +		    ." the possibility to save/restore their internal state\n";
>> +	    }
>> +	}
> 
> isn't the same also true of other local resources (e.g., passed-through
> USB?)?
> 
> maybe we could find a way to unify the checks we do for live migration
> (PVE::QemuServer::check_local_resources), since that is almost the same
> code inside Qemu as a stateful snapshot+rollback?
> 
> (not opposed to applying this before that happens though, just a
> question in general..)
> 

Similarly, there is the suspend API endpoint that could benefit from
having a single helper. I assume this code was copied from there.

[0]:
https://git.proxmox.com/?p=qemu-server.git;a=blob;f=PVE/QemuServer/PCI.pm;h=1673041bbe7a5d638a0ee9c56ea6bbb31027023b;hb=HEAD#l625




More information about the pve-devel mailing list