[pve-devel] [PATCH kernel] cherry-pick fix for KVM vCPU page fault loop

Fiona Ebner f.ebner at proxmox.com
Mon Sep 4 11:39:20 CEST 2023


The mailing list thread [0] (found by Friedrich, many thanks!) leading
up to this patch sounds very familiar to issues users reported in the
community forum [1] and enterprise support channel, where a VM would
be stuck for no discernable reason with all vCPU threads spinning.

[0]: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com/T/#u
[1]: https://forum.proxmox.com/threads/127459/

Suggested-by: Friedrich Weber <f.weber at proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
---
 ...an-sign-extension-bug-with-mmu_seq-t.patch | 75 +++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch

diff --git a/patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch b/patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch
new file mode 100644
index 0000000..18c268e
--- /dev/null
+++ b/patches/kernel/0013-KVM-x86-mmu-Fix-an-sign-extension-bug-with-mmu_seq-t.patch
@@ -0,0 +1,75 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Sean Christopherson <seanjc at google.com>
+Date: Wed, 23 Aug 2023 18:01:04 -0700
+Subject: [PATCH] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that
+ hangs vCPUs
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
+kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
+how KVM tracks the sequence counter snapshot.
+
+Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
+when checking to see if a page fault is stale, as the sequence count is
+stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
+where KVM will effectively hang vCPUs due to always thinking page faults
+are stale, which results in KVM refusing to "fix" faults.
+
+mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
+KVM is handling page faults to detect if userspace mappings relevant to
+the guest were invalidated between snapshotting the counter and acquiring
+mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
+resolve the page fault is fresh.  If KVM sees that the counter has
+changed, KVM simply resumes the guest without fixing the fault.
+
+What _should_ happen is that the source of the mmu_notifier invalidations
+eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
+again fix guest page fault(s).
+
+But for a long-lived VM and/or a VM that the host just doesn't particularly
+like, it's possible for a VM to be on the receiving end of 2 billion (with
+a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
+mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
+negative value when implicitly cast to an "int" by is_page_fault_stale(),
+and then sign-extended into a 64-bit unsigned when the signed "int" is
+implicitly cast back to an "unsigned long" on the call to
+mmu_invalidate_retry_hva().
+
+As a result of the casting and sign-extension, given a sequence counter of
+e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
+
+	if (0x8002dc25 != 0xffffffff8002dc25)
+
+and signals that the page fault is stale and needs to be retried even
+though the sequence counter is stable, and KVM effectively hangs any vCPU
+that takes a page fault (EPT violation or #NPF when TDP is enabled).
+
+Reported-by: Brian Rak <brak at vultr.com>
+Reported-by: Amaan Cheval <amaan.cheval at gmail.com>
+Reported-by: Eric Wheeler <kvm at lists.ewheeler.net>
+Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
+Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
+Signed-off-by: Sean Christopherson <seanjc at google.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
+(cherry-picked from commit 82d811ff566594de3676f35808e8a9e19c5c864c in stable v6.1.51)
+Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
+---
+ arch/x86/kvm/mmu/mmu.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
+index 3220c1285984..c42ba5cde7a4 100644
+--- a/arch/x86/kvm/mmu/mmu.c
++++ b/arch/x86/kvm/mmu/mmu.c
+@@ -4261,7 +4261,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+  * root was invalidated by a memslot update or a relevant mmu_notifier fired.
+  */
+ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
+-				struct kvm_page_fault *fault, int mmu_seq)
++				struct kvm_page_fault *fault,
++				unsigned long mmu_seq)
+ {
+ 	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
+ 
-- 
2.39.2






More information about the pve-devel mailing list