[pve-devel] [PATCH pve-installer v9] update the PCI(e) docs
Noel Ullreich
n.ullreich at proxmox.com
Thu Jul 20 11:32:48 CEST 2023
A little update to the PCI(e) docs. The PCI wiki article has been
reworked as well, in line with changes from this patch.
Along some minor grammar fixes added:
* how to check if kernelmodules are being loaded
* how to check which drivers to blacklist
* how to add softdeps for module loading
* where to find kernel params
Signed-off-by: Noel Ullreich <n.ullreich at proxmox.com>
---
changes from v1:
* fixed spelling mistakes
* reduced code snippets of how to check iommu groupings to one
* moved where to find kernel params to kernel cmdline section
* removed wrong info on display output. will add correct info to
Examples-Wiki
* changed module names to variable-names, so that people can't
blindly copy-paste.
* restructured commit message ;)
changes from v2:
* while moving where to find the kernel params to the kernel
cmdline section, I forgot to remove it from the pci(e) section
* fixed typo in the link to the kernel param section
changes from v3:
* Some restructuring of the layout as well as moving parts of the
PCI examples wiki to the docs here. This should lead to well-
structured, concise docs that are independent from the PCI wiki.
* found some more minor grammar errors
* found a spelling mistake in qm.adoc
changes from v4:
* formatted the git message wrong again :/
changes from v5:
* fixed links to wiki
* moved where to find kernel params to end of its chapter
* the `vfio_virqfd` does not need to be loaded anymore with kernel 6.2
changes from v6: the public wiki was updated -> fixed the links
changes from v7:
Forum user Leesteken noted that for Intel cpus IOMMU is not automatically
activated anymore.
https://forum.proxmox.com/threads/fix-pci-passthrough-documentation.122521/post-566926
Thanks Leesteken :)
changes from v8:
Amended Dominiks notes:
* added a note to remove `vfio_virqfd` with pve 7 eol
* fixed formatting of a note-block
qm-pci-passthrough.adoc | 165 +++++++++++++++++++++++++++++++---------
qm.adoc | 2 +-
system-booting.adoc | 8 ++
3 files changed, 137 insertions(+), 38 deletions(-)
diff --git a/qm-pci-passthrough.adoc b/qm-pci-passthrough.adoc
index b90a0b9..693deb7 100644
--- a/qm-pci-passthrough.adoc
+++ b/qm-pci-passthrough.adoc
@@ -13,19 +13,27 @@ features (e.g., offloading).
But, if you pass through a device to a virtual machine, you cannot use that
device anymore on the host or in any other VM.
+Note that, while PCI passthrough is available for i440fx and q35 machines, PCIe
+passthrough is only available on q35 machines. This does not mean that
+PCIe capable devices that are passed through as PCI devices will only run at
+PCI speeds. Passing through devices as PCIe just sets a flag for the guest to
+tell it that the device is a PCIe device instead of a "really fast legacy PCI
+device". Some guest applications benefit from this.
+
General Requirements
~~~~~~~~~~~~~~~~~~~~
-Since passthrough is a feature which also needs hardware support, there are
-some requirements to check and preparations to be done to make it work.
-
+Since passthrough is performed on real hardware, it needs to fulfill some
+requirements. A brief overview of these requirements is given below, for more
+information on specific devices, see
+https://pve.proxmox.com/wiki/PCI_Passthrough[PCI Passthrough Examples].
Hardware
^^^^^^^^
Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement
**U**nit) interrupt remapping, this includes the CPU and the mainboard.
-Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this.
+Generally, Intel systems with VT-d and AMD systems with AMD-Vi support this.
But it is not guaranteed that everything will work out of the box, due
to bad hardware implementation and missing or low quality drivers.
@@ -35,6 +43,17 @@ hardware, but even then, many modern system can support this.
Please refer to your hardware vendor to check if they support this feature
under Linux for your specific setup.
+Determining PCI Card Address
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The easiest way is to use the GUI to add a device of type "Host PCI" in the VM's
+hardware tab. Alternatively, you can use the command line.
+
+You can locate your card using
+
+----
+ lspci
+----
Configuration
^^^^^^^^^^^^^
@@ -44,13 +63,12 @@ some configuration to enable PCI(e) passthrough.
.IOMMU
-First, you have to enable IOMMU support in your BIOS/UEFI. Usually the
-corresponding setting is called `IOMMU` or `VT-d`,but you should find the exact
+First, you will have to enable IOMMU support in your BIOS/UEFI. Usually the
+corresponding setting is called `IOMMU` or `VT-d`, but you should find the exact
option name in the manual of your motherboard.
-For Intel CPUs, you may also need to enable the IOMMU on the
-xref:sysboot_edit_kernel_cmdline[kernel command line] for older (pre-5.15)
-kernels by adding:
+For Intel CPUs, you also need to enable the IOMMU on the
+xref:sysboot_edit_kernel_cmdline[kernel command line] kernels by adding:
----
intel_iommu=on
@@ -74,14 +92,17 @@ to the xref:sysboot_edit_kernel_cmdline[kernel commandline].
.Kernel Modules
+//TODO: remove `vfio_virqfd` stuff with eol of pve 7
You have to make sure the following modules are loaded. This can be achieved by
-adding them to `'/etc/modules''
+adding them to `'/etc/modules''. In kernels newer than 6.2 ({pve} 8 and onward)
+the 'vfio_virqfd' module is part of the 'vfio' module, therefore loading
+'vfio_virqfd' in {pve} 8 and newer is not necessary.
----
vfio
vfio_iommu_type1
vfio_pci
- vfio_virqfd
+ vfio_virqfd #not needed if on kernel 6.2 or newer
----
[[qm_pci_passthrough_update_initramfs]]
@@ -92,6 +113,14 @@ After changing anything modules related, you need to refresh your
# update-initramfs -u -k all
----
+To check if the modules are being loaded, the output of
+
+----
+# lsmod | grep vfio
+----
+
+should include the four modules from above.
+
.Finish Configuration
Finally reboot to bring the changes into effect and check that it is indeed
@@ -104,11 +133,16 @@ enabled.
should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is
enabled, depending on hardware and kernel the exact message can vary.
+For notes on how to troubleshoot or verify if IOMMU is working as intended, please
+see the https://pve.proxmox.com/wiki/PCI_Passthrough#Verifying_IOMMU_parameters[Verifying IOMMU Parameters]
+section in our wiki.
+
It is also important that the device(s) you want to pass through
-are in a *separate* `IOMMU` group. This can be checked with:
+are in a *separate* `IOMMU` group. This can be checked with a call to the {pve}
+API:
----
-# find /sys/kernel/iommu_groups/ -type l
+# pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""
----
It is okay if the device is in an `IOMMU` group together with its functions
@@ -159,8 +193,8 @@ PCI(e) card, for example a GPU or a network card.
Host Configuration
^^^^^^^^^^^^^^^^^^
-In this case, the host must not use the card. There are two methods to achieve
-this:
+{pve} tries to automatically make the PCI(e) device unavailable for the host.
+However, if this doesn't work, there are two things that can be done:
* pass the device IDs to the options of the 'vfio-pci' modules by adding
+
@@ -175,7 +209,7 @@ the vendor and device IDs obtained by:
# lspci -nn
----
-* blacklist the driver completely on the host, ensuring that it is free to bind
+* blacklist the driver on the host completely, ensuring that it is free to bind
for passthrough, with
+
----
@@ -183,11 +217,49 @@ for passthrough, with
----
+
in a .conf file in */etc/modprobe.d/*.
++
+To find the drivername, execute
++
+----
+# lspci -k
+----
++
+for example:
++
+----
+# lspci -k | grep -A 3 "VGA"
+----
++
+will output something similar to
++
+----
+01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)
+ Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 [GeForce GT 1030]
+ Kernel driver in use: <some-module>
+ Kernel modules: <some-module>
+----
++
+Now we can blacklist the drivers by writing them into a .conf file:
++
+----
+echo "blacklist <some-module>" >> /etc/modprobe.d/blacklist.conf
+----
For both methods you need to
xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and
reboot after that.
+Should this not work, you might need to set a soft dependency to load the gpu
+modules before loading 'vfio-pci'. This can be done with the 'softdep' flag, see
+also the manpages on 'modprobe.d' for more information.
+
+For example, if you are using drivers named <some-module>:
+
+----
+# echo "softdep <some-module> pre: vfio-pci" >> /etc/modprobe.d/<some-module>.conf
+----
+
+
.Verify Configuration
To check if your changes were successful, you can use
@@ -208,13 +280,42 @@ passthrough.
[[qm_pci_passthrough_vm_config]]
VM Configuration
^^^^^^^^^^^^^^^^
-To pass through the device you need to set the *hostpciX* option in the VM
+When passing through a GPU, the best compatibility is reached when using
+'q35' as machine type, 'OVMF' ('UEFI' for VMs) instead of SeaBIOS and PCIe
+instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the
+GPU needs to have an UEFI capable ROM, otherwise use SeaBIOS instead. To check if
+the ROM is UEFI capable, see the
+https://pve.proxmox.com/wiki/PCI_Passthrough#How_to_know_if_a_graphics_card_is_UEFI_.28OVMF.29_compatible[PCI Passthrough Examples]
+wiki.
+
+Furthermore, using OVMF, disabling vga arbitration may be possible, reducing the
+amount of legacy code needed to be run during boot. To disable vga arbitration:
+
+----
+ echo "options vfio-pci ids=<vendor-id>,<device-id> disable_vga=1" > /etc/modprobe.d/vfio.conf
+----
+
+replacing the <vendor-id> and <device-id> with the ones obtained from
+
+----
+# lspci -nn
+----
+
+PCI devices can be added in the web interface in the hardware section of the VM.
+Alternatively, you can use the command line; set the *hostpciX* option in the VM
configuration, for example by executing:
----
# qm set VMID -hostpci0 00:02.0
----
+or by adding a line to the VM configuration file:
+
+----
+ hostpci0: 00:02.0
+----
+
+
If your device has multiple functions (e.g., ``00:02.0`' and ``00:02.1`' ),
you can pass them through all together with the shortened syntax ``00:02`'.
This is equivalent with checking the ``All Functions`' checkbox in the
@@ -262,21 +363,21 @@ For example:
# qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000
----
-
-Other considerations
-^^^^^^^^^^^^^^^^^^^^
-
-When passing through a GPU, the best compatibility is reached when using
-'q35' as machine type, 'OVMF' ('EFI' for VMs) instead of SeaBIOS and PCIe
-instead of PCI. Note that if you want to use 'OVMF' for GPU passthrough, the
-GPU needs to have an EFI capable ROM, otherwise use SeaBIOS instead.
-
SR-IOV
~~~~~~
-Another variant for passing through PCI(e) devices, is to use the hardware
+Another variant for passing through PCI(e) devices is to use the hardware
virtualization features of your devices, if available.
+.Enabling SR-IOV
+[NOTE]
+====
+To use SR-IOV, platform support is especially important. It may be necessary
+to enable this feature in the BIOS/UEFI first, or to use a specific PCI(e) port
+for it to work. In doubt, consult the manual of the platform or contact its
+vendor.
+====
+
'SR-IOV' (**S**ingle-**R**oot **I**nput/**O**utput **V**irtualization) enables
a single device to provide multiple 'VF' (**V**irtual **F**unctions) to the
system. Each of those 'VF' can be used in a different VM, with full hardware
@@ -288,7 +389,6 @@ Currently, the most common use case for this are NICs (**N**etwork
physical port. This allows using features such as checksum offloading, etc. to
be used inside a VM, reducing the (host) CPU overhead.
-
Host Configuration
^^^^^^^^^^^^^^^^^^
@@ -326,14 +426,6 @@ After creating VFs, you should see them as separate PCI(e) devices when
outputting them with `lspci`. Get their ID and pass them through like a
xref:qm_pci_passthrough_vm_config[normal PCI(e) device].
-Other considerations
-^^^^^^^^^^^^^^^^^^^^
-
-For this feature, platform support is especially important. It may be necessary
-to enable this feature in the BIOS/EFI first, or to use a specific PCI(e) port
-for it to work. In doubt, consult the manual of the platform or contact its
-vendor.
-
Mediated Devices (vGPU, GVT-g)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -346,7 +438,6 @@ With this, a physical Card is able to create virtual cards, similar to SR-IOV.
The difference is that mediated devices do not appear as PCI(e) devices in the
host, and are such only suited for using in virtual machines.
-
Host Configuration
^^^^^^^^^^^^^^^^^^
diff --git a/qm.adoc b/qm.adoc
index b3c3034..ed804e2 100644
--- a/qm.adoc
+++ b/qm.adoc
@@ -139,7 +139,7 @@ snapshots) more intelligently.
{pve} allows to boot VMs with different firmware and machine types, namely
xref:qm_bios_and_uefi[SeaBIOS and OVMF]. In most cases you want to switch from
the default SeaBIOS to OVMF only if you plan to use
-xref:qm_pci_passthrough[PCIe pass through]. A VMs 'Machine Type' defines the
+xref:qm_pci_passthrough[PCIe passthrough]. A VMs 'Machine Type' defines the
hardware layout of the VM's virtual motherboard. You can choose between the
default https://en.wikipedia.org/wiki/Intel_440FX[Intel 440FX] or the
https://ark.intel.com/content/www/us/en/ark/products/31918/intel-82q35-graphics-and-memory-controller.html[Q35]
diff --git a/system-booting.adoc b/system-booting.adoc
index 0e61e3d..71603b0 100644
--- a/system-booting.adoc
+++ b/system-booting.adoc
@@ -288,6 +288,14 @@ The kernel commandline needs to be placed as one line in `/etc/kernel/cmdline`.
To apply your changes, run `proxmox-boot-tool refresh`, which sets it as the
`option` line for all config files in `loader/entries/proxmox-*.conf`.
+A complete list of kernel parameters can be found at
+'https://www.kernel.org/doc/html/v<YOUR-KERNEL-VERSION>/admin-guide/kernel-parameters.html'.
+replace <YOUR-KERNEL-VERSION> with the major.minor version (e.g. 5.15). You can
+find your kernel version by running
+
+----
+# uname -r
+----
[[sysboot_kernel_pin]]
Override the Kernel-Version for next Boot
--
2.39.2
More information about the pve-devel
mailing list