[pve-devel] [RFC pve-kernel] split lock detection: warn more prominently about misery mode

Tue Jun 11 16:08:51 CEST 2024

Am 09.04.24 um 14:48 schrieb Friedrich Weber:
> On processors with the `split_lock_detect` feature, the kernel will
> detect user-space split locks and, by default, apply a "misery mode"
> to offending tasks [1]. When a split lock is detected, the misery mode
> artificially slows down the task by forcing a 10ms and serialization
> with other offending tasks. This can also apply to vCPU threads of VM
> guests that perform unaligned memory access. In this case, the misery
> mode can result in temporary vCPU freezes of 10ms or more.
> 
> Currently, correlating observed vCPU freezes with split lock detection
> is hard for users: Split lock detection does log a warning, but the
> warning does not mention the slowdown, and is logged only once for
> each task, which makes troubleshooting hard in case of long-running
> VMs. Currently, vCPU threads of OVMF VMs seem to already produce one
> such warning on boot, which masks any split locks taken later by the
> guest OS.
> 
> To ease troubleshooting, add a patch that warns more prominently about
> misery mode. With this patch, the kernel warns every time a task is
> artificially slowed down. Even though the warnings are rate-limited by
> the `printk_ratelimit` mechanism, there is a risk of large volume of
> warnings, but this seems preferable to the misery mode going
> unnoticed.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel.c?id=727209376f49
> 
> Signed-off-by: Friedrich Weber <f.weber at proxmox.com>
> ---
> 
> Notes:
>     With this patch applied, I see a risk that some users will get a *lot*
>     of warnings about misery mode. By default, these are rate-limited to
>     10 messages per 5 seconds, but depending on the workload, this can
>     still mean a lot of messages. This is good for driving user
>     attention towards the split locks, but still, I wonder whether this
>     might spam the journal too much. What do you think?
>     

I know this is an upper limit, but 10 messages per 5 seconds is too much
IMHO. In particular, because certain guest OSes cannot be changed to
avoid taking split locks and users will get scared by these messages.
Unfortunately, reported_split_lock is just a flag and cannot be used as
a counter to log every N taken split locks, which for the right N might
be closer to what we/users actually want. Turning the flag into a
counter would mean wasting more space in the "current" struct, not sure
if that's worth it.

>     Also, currently there is no option to stop the warnings without
>     disabling misery mode (or split lock detection in general). Should
>     there be such an option? I could imagine making the
>     `split_lock_mitigate` sysctl ternary:
>     
>     2: Enable misery mode and warn every time (new behavior, new default)
>     1: Enable misery mode but warn only once (old behavior)
>     0: Disable misery mode and warn only once
>     
>     ... or even completely decouple misery mode and warning behavior?
>     What do you think?
> 

If we go for the current patch, I'm in favor of having a switch to turn
the frequent warnings off. I feel like it should be a separate switch then.

>  ...lways-warn-when-applying-misery-mode.patch | 67 +++++++++++++++++++
>  1 file changed, 67 insertions(+)
>  create mode 100644 patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
> 
> diff --git a/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
> new file mode 100644
> index 0000000..658de5e
> --- /dev/null
> +++ b/patches/kernel/0012-x86-split_lock-always-warn-when-applying-misery-mode.patch
> @@ -0,0 +1,67 @@
> +From f8db8223735c529a1dcaaf41217024ee4091a464 Mon Sep 17 00:00:00 2001
> +From: Friedrich Weber <f.weber at proxmox.com>
> +Date: Tue, 9 Apr 2024 10:57:43 +0200
> +Subject: [PATCH] x86/split_lock: always warn when applying misery to task
> +
> +Since commits b041b525dab9 ("x86/split_lock: Make life miserable for
> +split lockers") and 727209376f49 ("x86/split_lock: Add sysctl to
> +control the misery mode"), split lock detection artificially slows
> +down the offending task ("misery mode") if the `split_lock_mitigate`
> +sysctl is enabled (which it is by default). The corresponding warning
> +does not mention the slowdown, and is logged only once for each task.
> +In case of long-running VMs, this makes it hard to correlate observed
> +vCPU freezes to the split lock detection misery mode.
> +

A start would be to extend the warning to mention that it won't be
logged again and there can be slowdowns.

Maybe you can also ask upstream for feedback and suggestions.

> +To make troubleshooting easier, change the warning behavior:
> +
> +- If the `split_lock_mitigate` sysctl is enabled, warn every time
> +  misery is applied to a task, and rephrase the warning. Even though
> +  the warnings are rate-limited by the `printk_ratelimit` mechanism,
> +  this may result in a large volume of warnings for split-lock-heavy
> +  workloads, but this seems preferable to the misery mode going
> +  unnoticed.
> +- If the `split_lock_mitigate` sysctl is disabled, warn only once (as
> +  before), but mention explicitly that no further action is taken and
> +  subsequent traps will not be logged.
> +
> +Signed-off-by: Friedrich Weber <f.weber at proxmox.com>

As for the patch itself:

Reviewed-by: Fiona Ebner <f.ebner at proxmox.com>