PCI(E) Passthrough
The steps outlined in this article have been tested only on Proxmox Virtual Environment 8.
My systems have Intel CPUs and Nvidia GPUs so there are no instructions specific to AMD hardware. Generally AMD hardware is easier to handle in linux so I expect there shouldn’t be much trouble finding instructions.
Why Passthrough?
Passthrough is needed for direct hardware access in VMs. GPUs, disks, network interface cards and any other PCI(E) devices can be passed through to VMs in Proxmox.
Setting up Passthrough
PVE Node Configuration
Intel Nodes
IOMMU and VFIO
IOMMU stands for Input-Output Memory Managment Unit. It allows a system to map virtual memory addresses to physical ones. It is required for PCI(E) passthrough. VFIO stands for Virtual Function I/O. It is a Linux kernel subsystem that allows VMs direct access to hardware devices.
There is also a setting called IOMMU passthrough mode, which may be required for better performance. I don’t believe there are any downsides so make sure to enable it. Add the following parameters to the kernel cmdline: intel_iommu=on
, iommu=pt
.
LINE IN FILE: /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
Now we need to configure PVE to load the VFIO kernel modules on boot. Append the following lines to /etc/modules
.
APPEND TO FILE: /etc/modules*
vfio
vfio_iommu_type1
vfio_pci
Now we need to rebuild the initramfs. Use the following command:
update-initramfs -u -k all
Other config options(unsafe interrupts and the like)
Blacklisting Drivers
For some hardware devices, it is much easier to passthrough if the device driver is blacklisted on the PVE host. This is especially true for GPUs.
Nvidia
To blacklist Nvidia drivers create the file /etc/modprobe.d/nvidia.blacklist.conf
with the following contents:
FILE: /etc/modprobe.d/nvidia.blacklist.conf
blacklist nouveau
blacklist nvidia*
Intel
To blacklist Intel GPU drivers create the file /etc/modprobe.d/intel_gpu.blacklist.conf
with the following contents:
FILE: /etc/modprobe.d/nvidia.blacklist.conf
blacklist snd_hda_intel
blacklist snd_hda_codec_hdmi
blacklist i915
Make sure to rebuild your initramfs after making changes to /etc/modprobe.d/
:
update-initramfs -u -k all
Verify Drivers not loaded
After rebuilding your initramfs and rebooting the node, run the following command to verify that the device drivers are not being loaded.
lspci -nnk
Each PCI(e) device will be listed in the output. Find the device you want to passthrough and make sure it includes the line
Kernel driver in use: vfio-pci
or that the line is not present at all.
PVE Cluster Configuration
Now that our PVE hosts are configured, we can set cluster-wide Resource Mappings to enable easy passthrough to VMs. These can be configured in the web UI or via the pvesh
cli which is a bit more involved.
In the web UI, under Datacenter, select Resouce Mappings and under PCI Devices, click Add. This opens the menu for mapping PCI devices. Select the correct node and find the Device you intend to passthrough.
In this example I am going to pass through my Nvidia RTX 3070. It has two different IOMMU groups, one for the GPU itself, and one for the audio controller. I am going to map these together by selecting the entry for Pass through all functions as one device.
The name of the mapping must be unique per Node, however multiple nodes can have a device with an identical name. This is useful if you have two or more identical PVE nodes because it would simplify migrating VMs from node to node.
VM Configuration
In order to passthrough a PCI(e) device to a VM it must be created with the following settings:
- Machine Type:
q35
- BIOS:
OVMF (UEFI)
Once the VM is created, under the Hardware tab, click Add and select PCI Device. Select the correct Device in the Mapped Device drop-down. Check the Advanced box, and make sure PCI-Express is selected.
If passing through a GPU, selecting the Primary GPU checkbox will set the device as the main video adapter. This will cause the built-in VM console to fail. Do not enable this unless a you have a physical display connected to the GPU or you have already configured a remote access protocol.
Other Considerations
There is a large variety of options, configurations, and seemingly unexplained behaviors when working with PCI passthrough. A few of my notes are listed below, as well as links to the Proxmox Wiki which reiterate the instructions above and have some more situation specific knowledge.
When passing through PCI devices to VM, the VFIO subsystem needs to allocate a contiguous memory block for the VM. This means that memory ballooning will not function. This also means that in certain situations, Proxmox will throw an error when starting a VM with a PCI device attached. This happens when the pve host is either running near memory capacity, or when the host memory is fragmented: if the host takes too long to allocate the contigouous memory block, the VM start timeout will be reached and an error will be thrown. If you wait a few more minutes, the VM will eventually start.