[pve-devel] applied: [RFC kernel] cherry-pick scheduler fix to avoid temporary VM freezes on NUMA hosts

Thomas Lamprecht t.lamprecht at proxmox.com
Mon Mar 11 13:51:13 CET 2024


Am 17/01/2024 um 15:45 schrieb Friedrich Weber:
> Users have been reporting [1] that VMs occasionally become
> unresponsive with high CPU usage for some time (varying between ~1 and
> more than 60 seconds). After that time, the guests come back and
> continue running. Windows VMs seem most affected (not responding to
> pings during the hang, RDP sessions time out), but we also got reports
> about Linux VMs (reporting soft lockups). The issue was not present on
> host kernel 5.15 and was first reported with kernel 6.2. Users
> reported that the issue becomes easier to trigger the more memory is
> assigned to the guests. Setting mitigations=off was reported to
> alleviate (but not eliminate) the issue. For most users the issue
> seems to disappear after (also) disabling KSM [2], but some users
> reported freezes even with KSM disabled [3].
> 
> It turned out the reports concerned NUMA hosts only, and that the
> freezes correlated with runs of the NUMA balancer [4]. Users reported
> that disabling the NUMA balancer resolves the issue (even with KSM
> enabled).
> 
> We put together a Linux VM reproducer, ran a git-bisect on the kernel
> to find the commit introducing the issue and asked upstream for help
> [5]. As it turned out, an upstream bugreport was recently opened [6]
> and a preliminary fix to the KVM TDP MMU was proposed [7]. With that
> patch [7] on top of kernel 6.7, the reproducer does not trigger
> freezes anymore. As of now, the patch (or its v2 [8]) is not yet
> merged in the mainline kernel, and backporting it may be difficult due
> to dependencies on other KVM changes [9].
> 
> However, the bugreport [6] also prompted an upstream developer to
> propose a patch to the kernel scheduler logic that decides whether a
> contended spinlock/rwlock should be dropped [10]. Without the patch,
> PREEMPT_DYNAMIC kernels (such as ours) would always drop contended
> locks. With the patch, the kernel only drops contended locks if the
> kernel is currently set to preempt=full. As noted in the commit
> message [10], this can (counter-intuitively) improve KVM performance.
> Our kernel defaults to preempt=voluntary (according to
> /sys/kernel/debug/sched/preempt), so with the patch it does not drop
> contended locks anymore, and the reproducer does not trigger freezes
> anymore. Hence, backport [10] to our kernel.
> 
> [1] https://forum.proxmox.com/threads/130727/
> [2] https://forum.proxmox.com/threads/130727/page-4#post-575886
> [3] https://forum.proxmox.com/threads/130727/page-8#post-617587
> [4] https://www.kernel.org/doc/html/latest/admin-guide/sysctl/kernel.html#numa-balancing
> [5] https://lore.kernel.org/kvm/832697b9-3652-422d-a019-8c0574a188ac@proxmox.com/
> [6] https://bugzilla.kernel.org/show_bug.cgi?id=218259
> [7] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/
> [8] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/
> [9] https://lore.kernel.org/kvm/Zaa654hwFKba_7pf@google.com/
> [10] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/
> 
> Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
> ---
> 
> Notes:
>     This RFC is not meant to be applied immediately, but is intended to
>     sum up the current state of the issue and point out potential fixes.
>     
>     The patch [10] backported in this RFC hasn't been reviewed upstream
>     yet. And while it fixes the reproducer, it is not certain that it will
>     fix freezes seen by users on real-world workloads. Hence, it would be
>     desirable to also apply some variant of [7] [8] once it is applied
>     upstream, however there may be difficulties backporting it, as noted
>     above.
>     
>     So, in any case, for now it might sense to monitor how upstream
>     handles the situation, and then react accordingly. I'll continue to
>     participate upstream and send a v2 in due time.
> 
>  ...spinlocks-on-contention-iff-kernel-i.patch | 78 +++++++++++++++++++
>  1 file changed, 78 insertions(+)
>  create mode 100644 patches/kernel/0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch
> 
>

this was actually already applied for 6.5.13-1, thanks!




More information about the pve-devel mailing list