[pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode
Friedrich Weber
f.weber at proxmox.com
Tue Apr 9 14:48:25 CEST 2024
On processors with the `split_lock_detect` feature, the kernel will
detect user-space split locks and, by default, apply a "misery mode"
to offending tasks [1]. When a split lock is detected, the misery mode
artificially slows down the task by forcing a 10ms and serialization
with other offending tasks. This can also apply to vCPU threads of VM
guests that perform unaligned memory access. In this case, the misery
mode can result in temporary vCPU freezes of 10ms or more.
Currently, correlating observed vCPU freezes with split lock detection
is hard for users: Split lock detection does log a warning, but the
warning does not mention the slowdown, and is logged only once for
each task, which makes troubleshooting hard in case of long-running
VMs. Currently, vCPU threads of OVMF VMs seem to already produce one
such warning on boot, which masks any split locks taken later by the
guest OS.
To ease troubleshooting, add a patch that warns more prominently about
misery mode. With this patch, the kernel warns every time a task is
artificially slowed down. Even though the warnings are rate-limited by
the `printk_ratelimit` mechanism, there is a risk of large volume of
warnings, but this seems preferable to the misery mode going
unnoticed.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel.c?id=727209376f49
Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
---
Notes:
With this patch applied, I see a risk that some users will get a *lot*
of warnings about misery mode. By default, these are rate-limited to
10 messages per 5 seconds, but depending on the workload, this can
still mean a lot of messages. This is good for driving user
attention towards the split locks, but still, I wonder whether this
might spam the journal too much. What do you think?
Also, currently there is no option to stop the warnings without
disabling misery mode (or split lock detection in general). Should
there be such an option? I could imagine making the
`split_lock_mitigate` sysctl ternary:
2: Enable misery mode and warn every time (new behavior, new default)
1: Enable misery mode but warn only once (old behavior)
0: Disable misery mode and warn only once
... or even completely decouple misery mode and warning behavior?
What do you think?
...lways-warn-when-applying-misery-mode.patch | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
create mode 100644 patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
diff --git a/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
new file mode 100644
index 0000000..658de5e
--- /dev/null
+++ b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
@@ -0,0 +1,67 @@
+From f8db8223735c529a1dcaaf41217024ee4091a464 Mon Sep 17 00:00:00 2001
+From: Friedrich Weber <f.weber at proxmox.com>
+Date: Tue, 9 Apr 2024 10:57:43 +0200
+Subject: [PATCH] x86/split_lock: always warn when applying misery to task
+
+Since commits b041b525dab9 ("x86/split_lock: Make life miserable for
+split lockers") and 727209376f49 ("x86/split_lock: Add sysctl to
+control the misery mode"), split lock detection artificially slows
+down the offending task ("misery mode") if the `split_lock_mitigate`
+sysctl is enabled (which it is by default). The corresponding warning
+does not mention the slowdown, and is logged only once for each task.
+In case of long-running VMs, this makes it hard to correlate observed
+vCPU freezes to the split lock detection misery mode.
+
+To make troubleshooting easier, change the warning behavior:
+
+- If the `split_lock_mitigate` sysctl is enabled, warn every time
+ misery is applied to a task, and rephrase the warning. Even though
+ the warnings are rate-limited by the `printk_ratelimit` mechanism,
+ this may result in a large volume of warnings for split-lock-heavy
+ workloads, but this seems preferable to the misery mode going
+ unnoticed.
+- If the `split_lock_mitigate` sysctl is disabled, warn only once (as
+ before), but mention explicitly that no further action is taken and
+ subsequent traps will not be logged.
+
+Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
+---
+ arch/x86/kernel/cpu/intel.c | 13 +++++++++----
+ 1 file changed, 9 insertions(+), 4 deletions(-)
+
+diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
+index 40dec9b56f87..dad937a438e6 100644
+--- a/arch/x86/kernel/cpu/intel.c
++++ b/arch/x86/kernel/cpu/intel.c
+@@ -1164,12 +1164,11 @@ static void split_lock_warn(unsigned long ip)
+ struct delayed_work *work;
+ int cpu;
+
+- if (!current->reported_split_lock)
+- pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
++ if (sysctl_sld_mitigate) {
++ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx. "
++ "Artificially slowing down task as mitigation because split_lock_mitigate=1\n",
+ current->comm, current->pid, ip);
+- current->reported_split_lock = 1;
+
+- if (sysctl_sld_mitigate) {
+ /*
+ * misery factor #1:
+ * sleep 10ms before trying to execute split lock.
+@@ -1184,6 +1183,12 @@ static void split_lock_warn(unsigned long ip)
+ return;
+ work = &sl_reenable_unlock;
+ } else {
++ if (!current->reported_split_lock)
++ pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx. "
++ "Taking no further action. Subsequent split_lock traps will not be logged\n",
++ current->comm, current->pid, ip);
++ current->reported_split_lock = 1;
++
+ work = &sl_reenable;
+ }
+
+--
+2.39.2
+
--
2.39.2
More information about the pve-devel
mailing list