[pve-devel] [PATCH zfsonlinux] cherry-pick fix for overgrown dnode cache
Friedrich Weber
f.weber at proxmox.com
Mon Jul 28 11:52:23 CEST 2025
On 23/07/2025 20:15, Stoiko Ivanov wrote:
> the following patch seems applicable and might fix an issue observed
> in our enterprise support a while ago. containers run in their own
> cgroups, thus were probably not scanned by the kernel shrinker - this
> resulted in Dnode cache numbers of 300+% reported in arc_summary.
>
> FWICT the issue was introduced in ZFS 2.2.7
> (commit 5f73630e9cbea5efa23d16809f06e0d08523b241 see:
> https://github.com/openzfs/zfs/issues/17052#issuecomment-3065907783)
> but I assume that the increase of zfs_arc_max by default makes it
> trigger OOMs far easier.
>
> The discussion of the PR was quite instructive:
> https://github.com/openzfs/zfs/pull/17542
>
> minimally tested on a pair of trixie VMs (building + running
> replication of a couple of containers)
>
> Suggested-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
> Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
> ---
FWIW, tested this by setting up ZFS on root with an additional RAID-0
for guest disks, creating a Debian container and running a script [1] in it.
ARC size targets as reported by arc_summary:
Target size (adaptive): 100.0 % 794.0 MiB
Min size (hard limit): 31.3 % 248.2 MiB
Max size (high water): 3:1 794.0 MiB
With 6.14.8-2-pve (ZFS 2.3.3 without this patch), the Dnode Cache and
ARC grow considerably while the script is running and both stay like
this after the script has exited:
ARC size (current): 294.8 % 2.3 GiB
Dnode cache target: 10.0 % 79.4 MiB
Dnode cache size: 1181.9 % 938.5 MiB
Same on
- 6.8.12-13-pve (ZFS 2.2.8)
- 6.8.12-8-pve (ZFS 2.2.7)
With 6.8.12-6-pve (ZFS 2.2.6), the Dnode cache size still grows to >100%
and seems to stay there, but the ARC manages to stay below 100%:
ARC size (current): 96.8 % 768.9 MiB
Dnode cache target: 10.0 % 79.4 MiB
Dnode cache size: 333.6 % 264.9 MiB
With this patch on top of 6.8.12-9-pve, the Dnode Cache and ARC still
grow while the script is running, but both shrink again to <100% quickly
afterwards (within a minute or so):
ARC size (current): 30.9 % 245.3 MiB
Dnode cache target: 10.0 % 79.4 MiB
Dnode cache size: 99.0 % 78.6 MiB
We have an issue in enterprise support with a container-heavy workload
on ZFS 2.2.7 that is likely affected by this. However, they also saw
high Dnode cache size and >100% ARC on ZFS 2.2.6 -- the latter I
couldn't reproduce with ZFS 2.2.6, but perhaps I missed some factor.
Might be nice if we could make the fix available on PVE 8 as well,
though I'm not sure how easily this can be backported to ZFS 2.2.
[1]
#!/bin/bash
for h in $(seq 100); do
(
mkdir "dir-$h"
cd "dir-$h" || exit 1
for i in $(seq 100);
do
mkdir "$i"
for j in $(seq 100);
do
echo test > "$i/$j.txt"
done
done
)&
done
More information about the pve-devel
mailing list