[pve-devel] [PATCH guest-common/qemu-server/manager/docs v5 0/3] implement experimental vgpu live migration
Eneko Lacunza
elacunza at binovo.es
Wed Mar 5 11:34:55 CET 2025
Hi Dominik,
It is very likely we'll have access to a suitable cluster to test this
before summer, provided these patches are in published packages.
I can test and report back if that's helpful.
Regards
El 20/1/25 a las 15:51, Dominik Csapak escribió:
> and some useful cleanups
>
> This is implemented for mapped resources. This requires driver and
> hardware support, but aside from nvidia vgpus there don't seem to be
> many drivers (if any) that do support that.
>
> qemu already supports that for vfio-pci devices, so nothing to be
> done there besides actively enabling it.
>
> Since we currently can't properly test it here and very much depends on
> hardware/driver support, mark it as experimental everywhere (docs/api/gui).
> (though i tested the live-migration part manually here by using
> "exec:cat > /tmp/test" for the migration target, and "exec: cat
> /tmp/test" as the 'incoming' parameter for a new vm start, which worked ;) )
>
> i opted for marking them migratable at the mapping level, but we could
> theoretically also put it in the hostpciX config instead.
> (though imho it fits better in the cluster-wide resource mapping config)
>
> also the naming/texts could probably be improved, but i think
> 'live-migration-capable' is very descriptive and i didn't want to
> use an overly short name for it (which can be confusing, see the
> 'shared' flag for storages)
>
> should mostly be the same as v4 functionality/code-wise but still a bit
> changed due to the recent nvidia changes from our side, so probably
> warrants a bit of a closer look in any case
>
> changes from v4:
> * rebased on master (some work due to the recent nvidia changes)
> * incorporated thomas/alexanders feedback from v4
>
> changes from v3:
> * rebased on master
> * split first guest-common patch into 3
> * instead of merging keys, just write all expected keys in to expected_props
> * made $cfg optional so it does not break callers that don't call it
> * added patch to fix the cfg2cmd tests for mdev check
> * added patch to show vfio state transferred for migration
> * incorporated fionas feedback (mostly minor stuff)
>
> for more details see the individual patches
>
> changes from v2:
> * rebased on master
> * rework the rework of the properties check (pve-guest-common 1/4)
> * properly check mdev in the gui (pve-manager 1/5)
>
> manager patches depend on pve-guest-common/qemu-server patches
> qemu-server depends on pve-guest-common patches
>
> guest-common 3/3 breaks older qemu-server version before applying
> qemu-server patches 1&2
>
> pve-guest-common:
>
> Dominik Csapak (3):
> mapping: pci: check the mdev configuration on the device too
> mapping: pci: add 'live-migration-capable' flag to mappings
> mapping: remove find_on_current_node
>
> src/PVE/Mapping/PCI.pm | 27 +++++++++++++++------------
> src/PVE/Mapping/USB.pm | 10 ----------
> 2 files changed, 15 insertions(+), 22 deletions(-)
>
> qemu-server:
>
> Dominik Csapak (11):
> usb: mapping: move implementation of find_on_current_node here
> pci: mapping: move implementation of find_on_current_node here
> pci: mapping: check mdev config against hardware
> vm stop-cleanup: allow callers to decide error behavior
> migrate: call vm_stop_cleanup after stopping in phase3_cleanup
> pci: set 'enable-migration' to on for live-migration marked mapped
> devices
> check_local_resources: add more info per mapped device and return as
> hash
> api: enable live migration for marked mapped pci devices
> api: include not mapped resources for running vms in migrate
> preconditions
> tests: cfg2cmd: fix mdev tests
> migration: show vfio state transferred too
>
> PVE/API2/Qemu.pm | 55 ++++++++++++++++++++------------
> PVE/CLI/qm.pm | 2 +-
> PVE/QemuMigrate.pm | 44 +++++++++++++++++--------
> PVE/QemuServer.pm | 30 ++++++++++-------
> PVE/QemuServer/PCI.pm | 24 ++++++++++++--
> PVE/QemuServer/USB.pm | 17 ++++++++--
> test/MigrationTest/Shared.pm | 3 ++
> test/run_config2command_tests.pl | 2 +-
> 8 files changed, 123 insertions(+), 54 deletions(-)
>
> pve-manager
>
> Dominik Csapak (5):
> mapping: pci: include mdev in config checks
> bulk migrate: improve precondition checks
> bulk migrate: include checks for live-migratable local resources
> ui: adapt migration window to precondition api change
> fix #5175: ui: allow configuring and live migration of mapped pci
> resources
>
> PVE/API2/Cluster/Mapping/PCI.pm | 2 +-
> PVE/API2/Nodes.pm | 27 ++++++++++++++--
> www/manager6/dc/PCIMapView.js | 6 ++++
> www/manager6/window/Migrate.js | 51 ++++++++++++++++++++-----------
> www/manager6/window/PCIMapEdit.js | 12 ++++++++
> 5 files changed, 76 insertions(+), 22 deletions(-)
>
> pve-docs:
>
> Dominik Csapak (2):
> qm: resource mapping: add description for `mdev` option
> qm: resource mapping: document `live-migration-capable` setting
>
> qm.adoc | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
More information about the pve-devel
mailing list