[pve-devel] [PATCH docs] add documentation for pci passthrough and sr-iov
Dominik Csapak
d.csapak at proxmox.com
Mon Nov 12 16:00:46 CET 2018
explain what it is and how to use it, especially the steps necessary
on the host and the various options under one chapter
most of this is also found on the wiki in the Pci_passthrough
article
we may want to condense the information there and link it as
'notes and examples'
Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
---
qm-pci-passthrough.adoc | 237 ++++++++++++++++++++++++++++++++++++++++++++++++
qm.adoc | 3 +
2 files changed, 240 insertions(+)
create mode 100644 qm-pci-passthrough.adoc
diff --git a/qm-pci-passthrough.adoc b/qm-pci-passthrough.adoc
new file mode 100644
index 0000000..95e4ae1
--- /dev/null
+++ b/qm-pci-passthrough.adoc
@@ -0,0 +1,237 @@
+[[qm_pci_passthrough]]
+PCI(e) Passthrough
+------------------
+
+PCI(e) passthrough is a mechanism to give a virtual machine control over
+a pci device usually only available for the host. This can have some
+advantages over using virtualized hardware, for example lower latency,
+higher performance, or more features (e.g., offloading).
+
+If you pass through a device to a virtual machine, you cannot use that
+device anymore on the host or in any other VM.
+
+General Requirements
+~~~~~~~~~~~~~~~~~~~~
+
+Since passthrough is a feature which also needs hardware support, there are
+some requirements and steps before it can work.
+
+Hardware
+^^^^^^^^
+
+Your hardware has to support IOMMU interrupt remapping, this includes CPU and
+Mainboard.
+
+Generally Intel systems with VT-d, and AMD systems with AMD-Vi support this,
+but it is not guaranteed that everything will work out of the box, due
+to bad hardware implementation or missing/low quality drivers.
+
+In most cases, server grade hardware has better support than consumer grade
+hardware, but even then, many modern system can support this.
+
+Please refer to your hardware vendor if this is a feature that is supported
+under Linux.
+
+Configuration
+^^^^^^^^^^^^^
+
+To enable PCI(e) passthrough, there are some configurations needed.
+
+First, the iommu has to be activated on the kernel commandline.
+The easiest way is to enable it in */etc/default/grub*. Just add
+
+ intel_iommu=on
+
+or if you have AMD hardware:
+
+ amd_iommu=on
+
+to GRUB_CMDLINE_LINUX_DEFAULT
+
+After that, make sure you run 'update grub' to update grub.
+
+Second, you have to make sure the following modules are loaded.
+This can be achieved by adding them to */etc/modules*
+
+ vfio
+ vfio_iommu_type1
+ vfio_pci
+ vfio_virqfd
+
+After changing anything modules related, you need to refresh your
+initramfs with
+
+----
+update-initramfs -u -k all
+----
+
+Finally reboot and check that it is indeed enabled.
+
+----
+dmesg -e DMAR -e IOMMU -e AMD-Vi
+----
+
+should display that IOMMU, Directed I/O or Interrupt Remapping is enabled.
+(The exact message can vary, depending on hardware and kernel version)
+
+It is also important that the device(s) you want to pass through
+are in a seperate IOMMU group. This can be checked with:
+
+----
+find /sys/kernel/iommu_groups/ -type l
+----
+
+It is okay if the device is in an IOMMU group together with its functions
+(e.g. a GPU with the HDMI Audio device) or with its root port or PCI(e) bridge.
+
+.PCI(e) slots
+[NOTE]
+====
+Some platforms handle their PCI(e) slots differently, so if you
+do not get the desired IOMMU group separation, it may be helpful to
+try to put the card in a another PCI(e) slot.
+====
+
+.Unsafe interrupts
+[NOTE]
+====
+For some platforms, it may be necessary to allow unsafe interrupts.
+This can most easily enabled with adding the following line
+in a .conf file in */etc/modprobe.d/*.
+
+ options vfio_iommu_type1 allow_unsafe_interrupts=1
+
+Please be aware that this option can make your system unstable.
+====
+
+Host Device Passhtrough
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The most used variant of PCI(e) passthrough is to pass through a whole
+PCI(e) card, for example a GPU or network card.
+
+Host Configuration
+^^^^^^^^^^^^^^^^^^
+
+In this case, the host can not use the card. This can be achieved by two
+methods:
+
+Either add the ids to the options of the vfio-pci modules. This works
+with adding
+
+ options vfio-pci ids=1234:5678,4321:8765
+
+to a .conf file in */etc/modprobe.d/* where 1234:5678 and 4321:8765 are
+the vendor and device ids obtained by:
+
+----
+lcpci -nn
+----
+
+Or simply blacklist the driver completely on the host with
+
+ blacklist DRIVERNAME
+
+also in a .conf file in */etc/modprobe.d/*. Again update the initramfs
+and reboot after that.
+
+VM Configuration
+^^^^^^^^^^^^^^^^
+
+To pass through the device you set *hostpciX* on the VM with
+
+----
+qm set VMID -hostpci0 00:02.0
+----
+
+If your device has multiple functions, you can pass them through all together
+with the shortened syntax
+
+ 00:02
+
+There are some options to which may be necessary, depending on the device
+and guest OS.
+
+* *x-vga=on|off* marks the PCI(e) device the primary GPU of the VM.
+With this enabled the *vga* parameter of the config will be ignored.
+* *pcie=on|off* tells {pve} to use a PCIe or PCI port. Some guests/device
+combination require PCIe rather than PCI (only available for q35 machine types).
+* *rombar=on|off* makes the firmware ROM visible for the guest. Default is on.
+Some PCI(e) devices need this disabled.
+* *romfile=<path>*, is an optional path to a ROM file for the device to use.
+this is a relative path under */usr/share/kvm/*.
+
+An example of PCIe passthrough with a GPU set to primary:
+
+----
+qm set VMID -hostpci0 02:00,pcie=on,x-vga=on
+----
+
+Other considerations
+^^^^^^^^^^^^^^^^^^^^
+
+When passing through a GPU, the best compatibility is reached when using
+q35 as machine type, OVMF instead of SeaBIOS and PCIe instead of PCI.
+Note that if you want to use OVMF for GPU passthrough, the GPU needs
+to have an EFI capable ROM, otherwise use SeaBIOS instead.
+
+SR-IOV
+~~~~~~
+
+Another variant of passing through PCI(e) devices, is to use the hardware
+virtualization features of your devices.
+
+SR-IOV (Single-root input/output virtualization) enables a single device
+to provide multiple vf (virtual functions) to the system, so that each
+vf can be used in a different VM, with full hardware features, better
+performance and lower latency than software virtualized devices.
+
+The most used devices for this are NICs with SR-IOV which can provide
+multiple vf per physical port, allowing features such as
+checksum offloading, etc. to be used inside a VM, reducing CPU overhead.
+
+Host Configuration
+^^^^^^^^^^^^^^^^^^
+
+Generally there are 2 methods for enabling virtual functions on a device.
+
+In some cases there is an option for the driver module e.g. for some
+Intel drivers
+
+ max_vfs=4
+
+which could be put in a file in a .conf file in */etc/modprobe.d/*.
+(Do not forget to update your initramfs after that)
+
+Please refer to your driver module documentation for the exact
+parameters and options.
+
+The second (more generic) approach is via the sysfs.
+If a device and driver supports this you can change the number of vfs on
+the fly. For example 4 vfs on device 0000:01:00.0 with:
+
+----
+echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
+----
+
+To make this change persistent you can use sysfsutils.
+Just install them via
+
+----
+apt install sysfsutils
+----
+
+and configure it via */etc/sysfs.conf* or */etc/sysfs.d/*.
+
+VM Configuration
+^^^^^^^^^^^^^^^^
+
+After creating vfs, you should see them as seperate PCI(e) devices, which
+can be passed through like a normal PCI(e) device.
+
+Other considerations
+^^^^^^^^^^^^^^^^^^^^
+
+For this feature, platform support is especially important. It may be necessary
+to enable this feature in the BIOS or to use a specific PCI(e) port for it
+to work. In doubt, consult the manual of the platform or contact the vendor.
diff --git a/qm.adoc b/qm.adoc
index 5cf672d..0d453c8 100644
--- a/qm.adoc
+++ b/qm.adoc
@@ -1021,6 +1021,9 @@ ifndef::wiki[]
include::qm-cloud-init.adoc[]
endif::wiki[]
+ifndef::wiki[]
+include::qm-pci-passthrough.adoc[]
+endif::wiki[]
Managing Virtual Machines with `qm`
--
2.11.0
More information about the pve-devel
mailing list