QEMU crash with dpdk 22.11 app on Proxmox 8
Knight, Joshua
Joshua.Knight at netscout.com
Wed Aug 28 16:56:48 CEST 2024
We are seeing an issue on Proxmox 8 hosts where the underlying QEMU process for a guest will crash while starting a DPDK application in the guest.
* Proxmox 8.2.4 with QEMU 9.0.2-2
* Guest running Ubuntu 22.04, application is dpdk 22.11 testpmd
* Using virtio network interfaces that are up/connected
* Binding interfaces with the (legacy) igb_uio driver
When starting the application, the VM ssh connection will disconnect and the VM will be powered off in the ui.
root at karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s20
root at karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s21
root at karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s22
root at karma06:~/dpdk-22.11# python3 /root/dpdk-22.11/res/usr/local/bin/dpdk-devbind.py --bind=igb_uio enp6s23
root at karma06:~/dpdk-22.11# /root/dpdk-22.11/res/usr/local/bin/dpdk-testpmd -- -i --port-topology=chained --rxq=1 --txq=1 --rss-ip
EAL: Detected CPU lcores: 6
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:12.0 (socket -1)
eth_virtio_pci_init(): Failed to init PCI device
EAL: Requested device 0000:06:12.0 cannot be used
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:13.0 (socket -1)
eth_virtio_pci_init(): Failed to init PCI device
EAL: Requested device 0000:06:13.0 cannot be used
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:14.0 (socket -1)
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:15.0 (socket -1)
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:16.0 (socket -1)
EAL: Probe PCI driver: net_virtio (1af4:1000) device: 0000:06:17.0 (socket -1)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
Warning: NUMA should be configured manually by using --port-numa-config and --ring-numa-config parameters along with --numa.
testpmd: create a new mbuf pool <mb_pool_0>: n=187456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
client_loop: send disconnect: Broken pipe
A QEMU assertion is seen in the host’s system log. Using GDB we can see that QEMU is aborted.
karma QEMU[27334]: kvm: ../accel/kvm/kvm-all.c:1836: kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
Thread 10 "CPU 0/KVM" received signal SIGABRT, Aborted.
[Switching to Thread 0x7d999cc006c0 (LWP 36256)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo at entry=6, no_tid=no_tid at entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo at entry=6, no_tid=no_tid at entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007d99a10a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007d99a105afb2 in __GI_raise (sig=sig at entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007d99a1045472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007d99a1045395 in __assert_fail_base (fmt=0x7d99a11b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion at entry=0x5a9eb5a20f5e "ret == 0", file=file at entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line at entry=1836,
function=function at entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:92
#5 0x00007d99a1053eb2 in __GI___assert_fail (assertion=assertion at entry=0x5a9eb5a20f5e "ret == 0",
file=file at entry=0x5a9eb5a021a5 "../accel/kvm/kvm-all.c", line=line at entry=1836,
function=function at entry=0x5a9eb5a03ca0 <__PRETTY_FUNCTION__.23> "kvm_irqchip_commit_routes") at ./assert/assert.c:101
#6 0x00005a9eb566248c in kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1836
#7 kvm_irqchip_commit_routes (s=0x5a9eb79eed10) at ../accel/kvm/kvm-all.c:1821
#8 0x00005a9eb540bed2 in virtio_pci_one_vector_unmask (proxy=proxy at entry=0x5a9eb9f5ada0, queue_no=queue_no at entry=4294967295,
vector=vector at entry=0, msg=..., n=0x5a9eb9f63368) at ../hw/virtio/virtio-pci.c:991
#9 0x00005a9eb540c09c in virtio_pci_vector_unmask (dev=0x5a9eb9f5ada0, vector=0, msg=...) at ../hw/virtio/virtio-pci.c:1056
#10 0x00005a9eb536ff62 in msix_fire_vector_notifier (is_masked=false, vector=0, dev=0x5a9eb9f5ada0) at ../hw/pci/msix.c:120
#11 msix_handle_mask_update (dev=0x5a9eb9f5ada0, vector=0, was_masked=<optimized out>) at ../hw/pci/msix.c:140
#12 0x00005a9eb5602260 in memory_region_write_accessor (mr=0x5a9eb9f5b3e0, addr=12, value=<optimized out>, size=4, shift=<optimized out>,
mask=<optimized out>, attrs=...) at ../system/memory.c:497
#13 0x00005a9eb5602f4e in access_with_adjusted_size (addr=addr at entry=12, value=value at entry=0x7d999cbfae58, size=size at entry=4,
access_size_min=<optimized out>, access_size_max=<optimized out>, access_fn=0x5a9eb56021e0 <memory_region_write_accessor>,
mr=<optimized out>, attrs=...) at ../system/memory.c:573
#14 0x00005a9eb560403c in memory_region_dispatch_write (mr=mr at entry=0x5a9eb9f5b3e0, addr=addr at entry=12, data=<optimized out>,
op=<optimized out>, attrs=attrs at entry=...) at ../system/memory.c:1528
#15 0x00005a9eb560b95f in flatview_write_continue_step (attrs=attrs at entry=..., buf=buf at entry=0x7d99a3433028 "", mr_addr=12,
l=l at entry=0x7d999cbfaf80, mr=0x5a9eb9f5b3e0, len=4) at ../system/physmem.c:2713
#16 0x00005a9eb560bbed in flatview_write_continue (mr=<optimized out>, l=<optimized out>, mr_addr=<optimized out>, len=4, ptr=0xfdf8500c,
attrs=..., addr=4260909068, fv=0x7d8d6c0796b0) at ../system/physmem.c:2743
#17 flatview_write (fv=0x7d8d6c0796b0, addr=addr at entry=4260909068, attrs=attrs at entry=..., buf=buf at entry=0x7d99a3433028, len=len at entry=4)
at ../system/physmem.c:2774
#18 0x00005a9eb560f251 in address_space_write (len=4, buf=0x7d99a3433028, attrs=..., addr=4260909068, as=0x5a9eb66f1f20 <address_space_memory>)
at ../system/physmem.c:2894
#19 address_space_rw (as=0x5a9eb66f1f20 <address_space_memory>, addr=4260909068, attrs=attrs at entry=..., buf=buf at entry=0x7d99a3433028, len=4,
is_write=<optimized out>) at ../system/physmem.c:2904
#20 0x00005a9eb56660e8 in kvm_cpu_exec (cpu=cpu at entry=0x5a9eb81e6890) at ../accel/kvm/kvm-all.c:2917
#21 0x00005a9eb56676d5 in kvm_vcpu_thread_fn (arg=arg at entry=0x5a9eb81e6890) at ../accel/kvm/kvm-accel-ops.c:50
#22 0x00005a9eb581dfe8 in qemu_thread_start (args=0x5a9eb81ee390) at ../util/qemu-thread-posix.c:541
#23 0x00007d99a10a8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007d99a11287dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
One thing that’s interesting about this backtrace is it seems to exactly match an existing issue in QEMU that claims to be patched, and that patch should be present in QEMU 9.0.2, the version running on this Proxmox host.
https://gitlab.com/qemu-project/qemu/-/issues/1928
We’ve found a workaround by switching from the deprecated igb_uio driver to the vfio-pci driver when binding the interfaces for dpdk. In this case the VM does not crash. But I’m wondering if anyone has hit this before or if it’s a known issue. I would certainly not expect any operation in the guest to cause QEMU to crash. It’s also odd that the crash seen claims to be patched in 9.0.2.
We’ve been able to reproduce this on Proxmox 8.0, 8.1, 8.2 on both AMD and Intel processors. The crash does not occur on earlier releases such as Proxmox 6.4, and does not occur with earlier dpdk versions such as 20.08.
Thanks,
Josh
More information about the pve-user
mailing list