[pve-devel] [PATCH pve-docs] update the PCI(e) docs
Dominik Csapak
d.csapak at proxmox.com
Tue Mar 14 14:44:28 CET 2023
a few comments inline
On 3/14/23 13:48, Noel Ullreich wrote:
> A little update to the PCI(e) docs with the plan of reworking the PCI
> wiki as well.
>
> Some questions and reasoning to the patch:
> * I would only mention the ACS patch in the PCI examples wiki, since it is a
> last-ditch effort to get IOMMU to work and who knows how long we will support
> the patch.
> * Should I move the blacklising example to the example-wiki and just link to it?
> I don't want people blindly copy-pasting commands. Same goes for the softdep
> example.
first, these comments are not part of the commit message and should go below
the '---' part of the message
yes, i'd only ever mention the acs patch in the wiki, not in the reference
docs.
also the blacklisting example can stay here, but i'd make it more generic
(see comment further down)
>
> Signed-off-by: Noel Ullreich <n.ullreich at proxmox.com>
> ---
> qm-pci-passthrough.adoc | 87 +++++++++++++++++++++++++++++++++++------
> 1 file changed, 75 insertions(+), 12 deletions(-)
>
> diff --git a/qm-pci-passthrough.adoc b/qm-pci-passthrough.adoc
> index df6cf21..ed17b9c 100644
> --- a/qm-pci-passthrough.adoc
> +++ b/qm-pci-passthrough.adoc
> @@ -16,16 +16,17 @@ device anymore on the host or in any other VM.
> General Requirements
> ~~~~~~~~~~~~~~~~~~~~
>
> -Since passthrough is a feature which also needs hardware support, there are
> -some requirements to check and preparations to be done to make it work.
> -
> +Since passthrough is preformed on real hardware, the hardware needs to fulfill
> +some requirements. A brief overview of these requirements is given below, for more
> +information on specific devices, see
> +https://pve.proxmox.com/wiki/PCI_Passthrough[PCI Passthrough Examples].
this reads a bit weird: '[...]on real hardware, the hardware[...]'
i'd just go : '[...]on real hardware, it[...]'
should be clear enough
>
> Hardware
> ^^^^^^^^
> Your hardware needs to support `IOMMU` (*I*/*O* **M**emory **M**anagement
> **U**nit) interrupt remapping, this includes the CPU and the mainboard.
>
> -Generally, Intel systems with VT-d, and AMD systems with AMD-Vi support this.
> +Generally, Intel systems with VT-d and AMD systems with AMD-Vi support this.
> But it is not guaranteed that everything will work out of the box, due
> to bad hardware implementation and missing or low quality drivers.
>
> @@ -44,8 +45,8 @@ some configuration to enable PCI(e) passthrough.
>
> .IOMMU
>
> -First, you have to enable IOMMU support in your BIOS/UEFI. Usually the
> -corresponding setting is called `IOMMU` or `VT-d`,but you should find the exact
> +First, you will have to enable IOMMU support in your BIOS/UEFI. Usually the
> +corresponding setting is called `IOMMU` or `VT-d`, but you should find the exact
> option name in the manual of your motherboard.
>
> For Intel CPUs, you may also need to enable the IOMMU on the
> @@ -72,6 +73,9 @@ hardware IOMMU. To enable these options, add:
>
> to the xref:sysboot_edit_kernel_cmdline[kernel commandline].
>
> +For a complete list of kernel commandline options (of kernel 5.15), see
> +https://www.kernel.org/doc/html/v5.15/admin-guide/kernel-parameters.html[kernel.org].
> +
imho this should be in the 'edit kernel cmdline' section
and shouldn't be referencing a specific version but like this:
for a complete list, see
https://www.kernel.lorg/doc/html/v<YOUR-KERNEL-VERSION>/admin...
replace <YOUR-KERNEL-VERSION> with the major.minor version (e.g. 5.15)
that way we don't have to update the link on every kernel version bump
> .Kernel Modules
>
> You have to make sure the following modules are loaded. This can be achieved by
> @@ -92,6 +96,14 @@ After changing anything modules related, you need to refresh your
> # update-initramfs -u -k all
> ----
>
> +To check if the modules are being loaded, the output of
> +
> +----
> +# lsmod | grep vfio
> +----
> +
> +should include the four modules from above.
> +
> .Finish Configuration
>
> Finally reboot to bring the changes into effect and check that it is indeed
> @@ -105,8 +117,22 @@ should display that `IOMMU`, `Directed I/O` or `Interrupt Remapping` is
> enabled, depending on hardware and kernel the exact message can vary.
>
> It is also important that the device(s) you want to pass through
> -are in a *separate* `IOMMU` group. This can be checked with:
> +are in a *separate* `IOMMU` group. This can be checked either with:
>
> +* a call to the {pve} API:
> ++
> +----
> +# pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""
> +----
> +
> +* a bash oneliner:
> ++
> +----
> +# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
> +----
> +
> +* this command, although it gives less information than the other two:
> ++
i'd only give one option (preferably the pvesh one) since the user does not
need three commands to do it. also mentioning 'this also exists, but is inferior'
does not make much sense
> ----
> # find /sys/kernel/iommu_groups/ -type l
> ----
> @@ -148,6 +174,10 @@ desktop software (for example, VNC or RDP) inside the guest.
>
> If you want to use the GPU as a hardware accelerator, for example, for
> programs using OpenCL or CUDA, this is not required.
> +In this case, to use NoVNC or SPICE, you might need to unset the 'primary GPU'
> +flag(see xref:qm_pci_passthrough_vm_config[VM configuration]) and make sure the
> +GPU is not phyiscally connected to a monitor.
that's not completely correct, instead of unsetting 'primary gpu' one can also
set a specific display. and why shouldn't the user connect the gpu to a monitor?
this does not make a difference for the virtual display 99% of the time
> +
>
> Host Device Passthrough
> ~~~~~~~~~~~~~~~~~~~~~~~
> @@ -159,8 +189,8 @@ PCI(e) card, for example a GPU or a network card.
> Host Configuration
> ^^^^^^^^^^^^^^^^^^
>
> -In this case, the host must not use the card. There are two methods to achieve
> -this:
> +{pve} tries to automatically make the PCI(e) device unavailable for the host.
> +However, if this doesn't work, there are two things that can be done:
>
> * pass the device IDs to the options of the 'vfio-pci' modules by adding
> +
> @@ -175,7 +205,7 @@ the vendor and device IDs obtained by:
> # lspci -nn
> ----
>
> -* blacklist the driver completely on the host, ensuring that it is free to bind
> +* blacklist the driver on the host completely, ensuring that it is free to bind
> for passthrough, with
> +
> ----
> @@ -183,11 +213,46 @@ for passthrough, with
> ----
> +
> in a .conf file in */etc/modprobe.d/*.
> ++
> +To find the drivername, execute
> ++
> +----
> +# lspci -k
> +----
> ++
> +for example:
> ++
> +----
> +# lspci -k | grep -A 3 "VGA"
> +
> +// The output tells us, that the drivers are called `nvidia`
> +01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)
> + Subsystem: Micro-Star International Co., Ltd. [MSI] GP108 [GeForce GT 1030]
> + Kernel driver in use: nvidia
> + Kernel modules: nvidia
> +----
> ++
> +Now we can blacklist the drivers by writing them into a .conf file:
> ++
> +----
> +echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
> +----
this could stay here, but i'd replace the 'nvidia' in the example
with 'some-module', maybe i'd even replace the whole lspci output with
dummy info where it also says 'some-module'
then even if someone c&p, it should not have a harmful effect
>
> For both methods you need to
> xref:qm_pci_passthrough_update_initramfs[update the `initramfs`] again and
> reboot after that.
>
> +Should this not work, you might need to set a soft dependency to load the gpu
> +modules before loading 'vfio-pci'. This can be done with the 'softdep' flag, see
> +also the manpages on 'modprobe.d' for more information.
> +
> +For example, if you are using a NVIDIA gpu and using the 'nouveau' drivers:
> +
> +----
> +# echo "softdep nouveau pre: vfio-pci" >> /etc/modprobe.d/nouveau.conf
> +----
> +
> +
same here, just use 'some-module'
> .Verify Configuration
>
> To check if your changes were successful, you can use
> @@ -262,7 +327,6 @@ For example:
> # qm set VMID -hostpci0 02:00,device-id=0x10f6,sub-vendor-id=0x0000
> ----
>
> -
> Other considerations
> ^^^^^^^^^^^^^^^^^^^^
>
> @@ -288,7 +352,6 @@ Currently, the most common use case for this are NICs (**N**etwork
> physical port. This allows using features such as checksum offloading, etc. to
> be used inside a VM, reducing the (host) CPU overhead.
>
> -
> Host Configuration
> ^^^^^^^^^^^^^^^^^^
>
More information about the pve-devel
mailing list