[pve-devel] [PATCH qemu 2/2] cherry-pick some more stable fixes
Fiona Ebner
f.ebner at proxmox.com
Wed Aug 13 18:04:00 CEST 2025
Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
---
...t-Fix-VLAN-filter-table-reset-timing.patch | 107 +++++++++++++++
...width-of-third-operand-of-VINSERTx12.patch | 46 +++++++
...mem-fix-use-after-free-with-dispatch.patch | 126 ++++++++++++++++++
...y-one-and-invalid-access-in-virtqueu.patch | 86 ++++++++++++
debian/patches/series | 4 +
5 files changed, 369 insertions(+)
create mode 100644 debian/patches/extra/0003-virtio-net-Fix-VLAN-filter-table-reset-timing.patch
create mode 100644 debian/patches/extra/0004-target-i386-fix-width-of-third-operand-of-VINSERTx12.patch
create mode 100644 debian/patches/extra/0005-system-physmem-fix-use-after-free-with-dispatch.patch
create mode 100644 debian/patches/extra/0006-virtio-fix-off-by-one-and-invalid-access-in-virtqueu.patch
diff --git a/debian/patches/extra/0003-virtio-net-Fix-VLAN-filter-table-reset-timing.patch b/debian/patches/extra/0003-virtio-net-Fix-VLAN-filter-table-reset-timing.patch
new file mode 100644
index 0000000..9cfadff
--- /dev/null
+++ b/debian/patches/extra/0003-virtio-net-Fix-VLAN-filter-table-reset-timing.patch
@@ -0,0 +1,107 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp>
+Date: Sun, 27 Jul 2025 15:22:36 +0900
+Subject: [PATCH] virtio-net: Fix VLAN filter table reset timing
+
+Problem
+-------
+
+The expected initial state of the table depends on feature negotiation:
+
+With VIRTIO_NET_F_CTRL_VLAN:
+ The table must be empty in accordance with the specification.
+Without VIRTIO_NET_F_CTRL_VLAN:
+ The table must be filled to permit all VLAN traffic.
+
+Prior to commit 06b636a1e2ad ("virtio-net: do not reset vlan filtering
+at set_features"), virtio_net_set_features() always reset the VLAN
+table. That commit changed the behavior to skip table reset when
+VIRTIO_NET_F_CTRL_VLAN was negotiated, assuming the table would be
+properly cleared during device reset and remain stable.
+
+However, this assumption breaks when a driver renegotiates features:
+1. Initial negotiation without VIRTIO_NET_F_CTRL_VLAN (table filled)
+2. Renegotiation with VIRTIO_NET_F_CTRL_VLAN (table will not be cleared)
+
+The problem was exacerbated by commit 0caed25cd171 ("virtio: Call
+set_features during reset"), which triggered virtio_net_set_features()
+during device reset, exposing the bug whenever VIRTIO_NET_F_CTRL_VLAN
+was negotiated after a device reset.
+
+Solution
+--------
+
+Fix the issue by initializing the table when virtio_net_set_features()
+is called to change the VIRTIO_NET_F_CTRL_VLAN bit of
+vdev->guest_features.
+
+This approach ensures the correct table state regardless of feature
+negotiation sequence by performing initialization in
+virtio_net_set_features() as QEMU did prior to commit 06b636a1e2ad
+("virtio-net: do not reset vlan filtering at set_features").
+
+This change still preserves the goal of the commit, which was to avoid
+resetting the table during migration, by checking whether the
+VIRTIO_NET_F_CTRL_VLAN bit of vdev->guest_features is being changed;
+vdev->guest_features is set before virtio_net_set_features() gets called
+during migration.
+
+It also avoids resetting the table when the driver sets a feature
+bitmask with no change for the VIRTIO_NET_F_CTRL_VLAN bit, which makes
+the operation idempotent and its semantics cleaner.
+
+Additionally, this change ensures the table is initialized after
+feature negotiation and before the DRIVER_OK status bit being set for
+compatibility with the Linux driver before commit 50c0ada627f5
+("virtio-net: fix race between ndo_open() and virtio_device_ready()"),
+which did not ensure to set the DRIVER_OK status bit before modifying
+the table.
+
+Fixes: 06b636a1e2ad ("virtio-net: do not reset vlan filtering at set_features")
+Cc: qemu-stable at nongnu.org
+Reported-by: Konstantin Shkolnyy <kshk at linux.ibm.com>
+Signed-off-by: Akihiko Odaki <odaki at rsg.ci.i.u-tokyo.ac.jp>
+Tested-by: Konstantin Shkolnyy <kshk at linux.ibm.com>
+Tested-by: Lei Yang <leiyang at redhat.com>
+Message-Id: <20250727-vlan-v3-1-bbee738619b1 at rsg.ci.i.u-tokyo.ac.jp>
+Tested-by: Konstantin Shkolnyy <kshk at linux.ibm.com>
+Reviewed-by: Michael S. Tsirkin <mst at redhat.com>
+Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
+(cherry picked from commit 6071d13c6a37493a6b26e1609b09a98aa058038a)
+Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
+---
+ hw/net/virtio-net.c | 7 ++++---
+ 1 file changed, 4 insertions(+), 3 deletions(-)
+
+diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
+index 5f908e5bca..8a0ea4cff5 100644
+--- a/hw/net/virtio-net.c
++++ b/hw/net/virtio-net.c
+@@ -997,8 +997,9 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
+ vhost_net_save_acked_features(nc->peer);
+ }
+
+- if (!virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) {
+- memset(n->vlans, 0xff, MAX_VLAN >> 3);
++ if (virtio_has_feature(vdev->guest_features ^ features, VIRTIO_NET_F_CTRL_VLAN)) {
++ bool vlan = virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN);
++ memset(n->vlans, vlan ? 0 : 0xff, MAX_VLAN >> 3);
+ }
+
+ if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+@@ -3896,6 +3897,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
+ n->mac_table.macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);
+
+ n->vlans = g_malloc0(MAX_VLAN >> 3);
++ memset(n->vlans, 0xff, MAX_VLAN >> 3);
+
+ nc = qemu_get_queue(n->nic);
+ nc->rxfilter_notify_enabled = 1;
+@@ -3986,7 +3988,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
+ memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+ memcpy(&n->mac[0], &n->nic->conf->macaddr, sizeof(n->mac));
+ qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+- memset(n->vlans, 0, MAX_VLAN >> 3);
+
+ /* Flush any async TX */
+ for (i = 0; i < n->max_queue_pairs; i++) {
diff --git a/debian/patches/extra/0004-target-i386-fix-width-of-third-operand-of-VINSERTx12.patch b/debian/patches/extra/0004-target-i386-fix-width-of-third-operand-of-VINSERTx12.patch
new file mode 100644
index 0000000..e15333e
--- /dev/null
+++ b/debian/patches/extra/0004-target-i386-fix-width-of-third-operand-of-VINSERTx12.patch
@@ -0,0 +1,46 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Paolo Bonzini <pbonzini at redhat.com>
+Date: Fri, 25 Jul 2025 01:10:12 +0200
+Subject: [PATCH] target/i386: fix width of third operand of VINSERTx128
+
+Table A-5 of the Intel manual incorrectly lists the third operand of
+VINSERTx128 as Wqq, but it is actually a 128-bit value. This is
+visible when W is a memory operand close to the end of the page.
+
+Fixes the recently-added poly1305_kunit test in linux-next.
+
+(No testcase yet, but I plan to modify test-avx2 to use memory
+close to the end of the page. This would work because the test
+vectors correctly have the memory operand as xmm2/m128).
+
+Reported-by: Eric Biggers <ebiggers at kernel.org>
+Tested-by: Eric Biggers <ebiggers at kernel.org>
+Cc: Ard Biesheuvel <ardb at kernel.org>
+Cc: "Jason A. Donenfeld" <Jason at zx2c4.com>
+Cc: Guenter Roeck <linux at roeck-us.net>
+Cc: qemu-stable at nongnu.org
+Fixes: 79068477686 ("target/i386: reimplement 0x0f 0x3a, add AVX", 2022-10-18)
+Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
+(cherry picked from commit feea87cd6b645d5166bdd304aac88f47f63dc2ef)
+Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
+---
+ target/i386/tcg/decode-new.c.inc | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
+index cda32ee678..f4cfc196b8 100644
+--- a/target/i386/tcg/decode-new.c.inc
++++ b/target/i386/tcg/decode-new.c.inc
+@@ -878,10 +878,10 @@ static const X86OpEntry opcodes_0F3A[256] = {
+ [0x0e] = X86_OP_ENTRY4(VPBLENDW, V,x, H,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66),
+ [0x0f] = X86_OP_ENTRY4(PALIGNR, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66),
+
+- [0x18] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,qq, vex6 chk(W0) cpuid(AVX) p_66),
++ [0x18] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,dq, vex6 chk(W0) cpuid(AVX) p_66),
+ [0x19] = X86_OP_ENTRY3(VEXTRACTx128, W,dq, V,qq, I,b, vex6 chk(W0) cpuid(AVX) p_66),
+
+- [0x38] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,qq, vex6 chk(W0) cpuid(AVX2) p_66),
++ [0x38] = X86_OP_ENTRY4(VINSERTx128, V,qq, H,qq, W,dq, vex6 chk(W0) cpuid(AVX2) p_66),
+ [0x39] = X86_OP_ENTRY3(VEXTRACTx128, W,dq, V,qq, I,b, vex6 chk(W0) cpuid(AVX2) p_66),
+
+ /* Listed incorrectly as type 4 */
diff --git a/debian/patches/extra/0005-system-physmem-fix-use-after-free-with-dispatch.patch b/debian/patches/extra/0005-system-physmem-fix-use-after-free-with-dispatch.patch
new file mode 100644
index 0000000..a1acfe6
--- /dev/null
+++ b/debian/patches/extra/0005-system-physmem-fix-use-after-free-with-dispatch.patch
@@ -0,0 +1,126 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Pierrick Bouvier <pierrick.bouvier at linaro.org>
+Date: Thu, 24 Jul 2025 09:11:42 -0700
+Subject: [PATCH] system/physmem: fix use-after-free with dispatch
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+A use-after-free bug was reported when booting a Linux kernel during the
+pci setup phase. It's quite hard to reproduce (needs smp, and favored by
+having several pci devices with BAR and specific Linux config, which
+is Debian default one in this case).
+
+After investigation (see the associated bug ticket), it appears that,
+under specific conditions, we might access a cached AddressSpaceDispatch
+that was reclaimed by RCU thread meanwhile.
+In the Linux boot scenario, during the pci phase, memory region are
+destroyed/recreated, resulting in exposition of the bug.
+
+The core of the issue is that we cache the dispatch associated to
+current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
+tcg_commit, which runs asynchronously on a given cpu.
+At some point, we leave the rcu critial section, and the RCU thread
+starts reclaiming it, but tcg_commit is not yet invoked, resulting in
+the use-after-free.
+
+It's not the first problem around this area, and commit 0d58c660689 [1]
+("softmmu: Use async_run_on_cpu in tcg_commit") already tried to
+address it. It did a good job, but it seems that we found a specific
+situation where it's not enough.
+
+This patch takes a simple approach: remove the cached value creating the
+issue, and make sure we always get the current mapping for address
+space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
+It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
+This is not really costly, we just need two dereferences,
+including one atomic (rcu) read, which is negligible considering we are
+already on mmu slow path anyway.
+
+Note that tcg_commit is still needed, as it's taking care of flushing
+TLB, removing previously mapped entries.
+
+Another solution would be to cache directly values under the dispatch
+(dispatch themselves are not ref counted), keep an active reference on
+associated memory section, and release it when appropriate (tricky).
+Given the time already spent debugging this area now and previously, I
+strongly prefer eliminating the root of the issue, instead of adding
+more complexity for a hypothetical performance gain. RCU is precisely
+used to ensure good performance when reading data, so caching is not as
+beneficial as it might seem IMHO.
+
+[1] https://gitlab.com/qemu-project/qemu/-/commit/0d58c660689f6da1e3feff8a997014003d928b3b
+
+Cc: qemu-stable at nongnu.org
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier at linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson at linaro.org>
+Reviewed-by: Michael Tokarev <mjt at tls.msk.ru>
+Tested-by: Michael Tokarev <mjt at tls.msk.ru>
+Message-ID: <20250724161142.2803091-1-pierrick.bouvier at linaro.org>
+Signed-off-by: Philippe Mathieu-Daudé <philmd at linaro.org>
+(cherry picked from commit 2865bf1c5795371ef441e9029bc22566ccff8299)
+Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
+---
+ system/physmem.c | 15 +++------------
+ 1 file changed, 3 insertions(+), 12 deletions(-)
+
+diff --git a/system/physmem.c b/system/physmem.c
+index 333a5eb94d..32f5895b80 100644
+--- a/system/physmem.c
++++ b/system/physmem.c
+@@ -164,13 +164,11 @@ static bool ram_is_cpr_compatible(RAMBlock *rb);
+ * CPUAddressSpace: all the information a CPU needs about an AddressSpace
+ * @cpu: the CPU whose AddressSpace this is
+ * @as: the AddressSpace itself
+- * @memory_dispatch: its dispatch pointer (cached, RCU protected)
+ * @tcg_as_listener: listener for tracking changes to the AddressSpace
+ */
+ typedef struct CPUAddressSpace {
+ CPUState *cpu;
+ AddressSpace *as;
+- struct AddressSpaceDispatch *memory_dispatch;
+ MemoryListener tcg_as_listener;
+ } CPUAddressSpace;
+
+@@ -689,7 +687,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr orig_addr,
+ IOMMUTLBEntry iotlb;
+ int iommu_idx;
+ hwaddr addr = orig_addr;
+- AddressSpaceDispatch *d = cpu->cpu_ases[asidx].memory_dispatch;
++ AddressSpaceDispatch *d = address_space_to_dispatch(cpu->cpu_ases[asidx].as);
+
+ for (;;) {
+ section = address_space_translate_internal(d, addr, &addr, plen, false);
+@@ -2673,7 +2671,7 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
+ {
+ int asidx = cpu_asidx_from_attrs(cpu, attrs);
+ CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
+- AddressSpaceDispatch *d = cpuas->memory_dispatch;
++ AddressSpaceDispatch *d = address_space_to_dispatch(cpuas->as);
+ int section_index = index & ~TARGET_PAGE_MASK;
+ MemoryRegionSection *ret;
+
+@@ -2750,9 +2748,6 @@ static void tcg_log_global_after_sync(MemoryListener *listener)
+
+ static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
+ {
+- CPUAddressSpace *cpuas = data.host_ptr;
+-
+- cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
+ tlb_flush(cpu);
+ }
+
+@@ -2768,11 +2763,7 @@ static void tcg_commit(MemoryListener *listener)
+ cpu = cpuas->cpu;
+
+ /*
+- * Defer changes to as->memory_dispatch until the cpu is quiescent.
+- * Otherwise we race between (1) other cpu threads and (2) ongoing
+- * i/o for the current cpu thread, with data cached by mmu_lookup().
+- *
+- * In addition, queueing the work function will kick the cpu back to
++ * Queueing the work function will kick the cpu back to
+ * the main loop, which will end the RCU critical section and reclaim
+ * the memory data structures.
+ *
diff --git a/debian/patches/extra/0006-virtio-fix-off-by-one-and-invalid-access-in-virtqueu.patch b/debian/patches/extra/0006-virtio-fix-off-by-one-and-invalid-access-in-virtqueu.patch
new file mode 100644
index 0000000..68c67da
--- /dev/null
+++ b/debian/patches/extra/0006-virtio-fix-off-by-one-and-invalid-access-in-virtqueu.patch
@@ -0,0 +1,86 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Jonah Palmer <jonah.palmer at oracle.com>
+Date: Mon, 21 Jul 2025 15:02:08 +0000
+Subject: [PATCH] virtio: fix off-by-one and invalid access in
+ virtqueue_ordered_fill
+
+Commit b44135daa372 introduced virtqueue_ordered_fill for
+VIRTIO_F_IN_ORDER support but had a few issues:
+
+* Conditional while loop used 'steps <= max_steps' but should've been
+ 'steps < max_steps' since reaching steps == max_steps would indicate
+ that we didn't find an element, which is an error. Without this
+ change, the code would attempt to read invalid data at an index
+ outside of our search range.
+
+* Incremented 'steps' using the next chain's ndescs instead of the
+ current one.
+
+This patch corrects the loop bounds and synchronizes 'steps' and index
+increments.
+
+We also add a defensive sanity check against malicious or invalid
+descriptor counts to avoid a potential infinite loop and DoS.
+
+Fixes: b44135daa372 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support")
+Reported-by: terrynini <terrynini38514 at gmail.com>
+Signed-off-by: Jonah Palmer <jonah.palmer at oracle.com>
+Message-Id: <20250721150208.2409779-1-jonah.palmer at oracle.com>
+Reviewed-by: Si-Wei Liu <si-wei.liu at oracle.com>
+Acked-by: Jason Wang <jasowang at redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst at redhat.com>
+Signed-off-by: Michael S. Tsirkin <mst at redhat.com>
+(cherry picked from commit 6fcf5ebafad65adc19a616260ca7dc90005785d1)
+Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
+---
+ hw/virtio/virtio.c | 22 ++++++++++++++++------
+ 1 file changed, 16 insertions(+), 6 deletions(-)
+
+diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
+index ec54573feb..b756f49867 100644
+--- a/hw/virtio/virtio.c
++++ b/hw/virtio/virtio.c
+@@ -929,18 +929,18 @@ static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
+ static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+ unsigned int len)
+ {
+- unsigned int i, steps, max_steps;
++ unsigned int i, steps, max_steps, ndescs;
+
+ i = vq->used_idx % vq->vring.num;
+ steps = 0;
+ /*
+- * We shouldn't need to increase 'i' by more than the distance
+- * between used_idx and last_avail_idx.
++ * We shouldn't need to increase 'i' by more than or equal to
++ * the distance between used_idx and last_avail_idx (max_steps).
+ */
+ max_steps = (vq->last_avail_idx - vq->used_idx) % vq->vring.num;
+
+ /* Search for element in vq->used_elems */
+- while (steps <= max_steps) {
++ while (steps < max_steps) {
+ /* Found element, set length and mark as filled */
+ if (vq->used_elems[i].index == elem->index) {
+ vq->used_elems[i].len = len;
+@@ -948,8 +948,18 @@ static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+ break;
+ }
+
+- i += vq->used_elems[i].ndescs;
+- steps += vq->used_elems[i].ndescs;
++ ndescs = vq->used_elems[i].ndescs;
++
++ /* Defensive sanity check */
++ if (unlikely(ndescs == 0 || ndescs > vq->vring.num)) {
++ qemu_log_mask(LOG_GUEST_ERROR,
++ "%s: %s invalid ndescs %u at position %u\n",
++ __func__, vq->vdev->name, ndescs, i);
++ return;
++ }
++
++ i += ndescs;
++ steps += ndescs;
+
+ if (i >= vq->vring.num) {
+ i -= vq->vring.num;
diff --git a/debian/patches/series b/debian/patches/series
index 6e5fd69..c0e5a0d 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1,5 +1,9 @@
extra/0001-monitor-qmp-fix-race-with-clients-disconnecting-earl.patch
extra/0002-ide-avoid-potential-deadlock-when-draining-during-tr.patch
+extra/0003-virtio-net-Fix-VLAN-filter-table-reset-timing.patch
+extra/0004-target-i386-fix-width-of-third-operand-of-VINSERTx12.patch
+extra/0005-system-physmem-fix-use-after-free-with-dispatch.patch
+extra/0006-virtio-fix-off-by-one-and-invalid-access-in-virtqueu.patch
bitmap-mirror/0001-drive-mirror-add-support-for-sync-bitmap-mode-never.patch
bitmap-mirror/0002-drive-mirror-add-support-for-conditional-and-always-.patch
bitmap-mirror/0003-mirror-add-check-for-bitmap-mode-without-bitmap.patch
--
2.47.2
More information about the pve-devel
mailing list