[pve-devel] [PATCH zfsonlinux] cherry-pick fix for overgrown dnode cache

Stoiko Ivanov s.ivanov at proxmox.com
Mon Jul 28 18:33:31 CEST 2025


On Mon, 28 Jul 2025 11:52:23 +0200
Friedrich Weber <f.weber at proxmox.com> wrote:

> On 23/07/2025 20:15, Stoiko Ivanov wrote:
> > the following patch seems applicable and might fix an issue observed
> > in our enterprise support a while ago. containers run in their own
> > cgroups, thus were probably not scanned by the kernel shrinker - this
> > resulted in Dnode cache numbers of 300+% reported in arc_summary.
> > 
> > FWICT the issue was introduced in ZFS 2.2.7
> > (commit 5f73630e9cbea5efa23d16809f06e0d08523b241 see:
> > https://github.com/openzfs/zfs/issues/17052#issuecomment-3065907783)
> > but I assume that the increase of zfs_arc_max by default makes it
> > trigger OOMs far easier.
> > 
> > The discussion of the PR was quite instructive:
> > https://github.com/openzfs/zfs/pull/17542
> > 
> > minimally tested on a pair of trixie VMs (building + running
> > replication of a couple of containers)
> > 
> > Suggested-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
> > Signed-off-by: Stoiko Ivanov <s.ivanov at proxmox.com>
> > ---  
> 
> FWIW, tested this by setting up ZFS on root with an additional RAID-0
> for guest disks, creating a Debian container and running a script [1] in it.
> 
> ARC size targets as reported by arc_summary:
> 
>         Target size (adaptive):                       100.0 %  794.0 MiB
>         Min size (hard limit):                         31.3 %  248.2 MiB
>         Max size (high water):                            3:1  794.0 MiB
> 
> With 6.14.8-2-pve (ZFS 2.3.3 without this patch), the Dnode Cache and
> ARC grow considerably while the script is running and both stay like
> this after the script has exited:
> 
> ARC size (current):                                   294.8 %    2.3 GiB
>         Dnode cache target:                            10.0 %   79.4 MiB
>         Dnode cache size:                            1181.9 %  938.5 MiB
> 
> Same on
> - 6.8.12-13-pve (ZFS 2.2.8)
> - 6.8.12-8-pve (ZFS 2.2.7)
> 
> With 6.8.12-6-pve (ZFS 2.2.6), the Dnode cache size still grows to >100%
> and seems to stay there, but the ARC manages to stay below 100%:
> 
> ARC size (current):                                    96.8 %  768.9 MiB
>         Dnode cache target:                            10.0 %   79.4 MiB
>         Dnode cache size:                             333.6 %  264.9 MiB
> 
> With this patch on top of 6.8.12-9-pve, the Dnode Cache and ARC still
> grow while the script is running, but both shrink again to <100% quickly
> afterwards (within a minute or so):
> 
> ARC size (current):                                    30.9 %  245.3 MiB
>         Dnode cache target:                            10.0 %   79.4 MiB
>         Dnode cache size:                              99.0 %   78.6 MiB
> 
> We have an issue in enterprise support with a container-heavy workload
> on ZFS 2.2.7 that is likely affected by this. However, they also saw
> high Dnode cache size and >100% ARC on ZFS 2.2.6 -- the latter I
> couldn't reproduce with ZFS 2.2.6, but perhaps I missed some factor.
Thanks for the tests, short reproducer and feedback!

> 
> Might be nice if we could make the fix available on PVE 8 as well,
> though I'm not sure how easily this can be backported to ZFS 2.2.
sent my attempt at a backport:
https://lore.proxmox.com/pve-devel/20250728163041.1287899-1-s.ivanov@proxmox.com/T/#u

> 
> [1]
> 
> #!/bin/bash
> for h in $(seq 100); do
> 	(
> 	mkdir "dir-$h"
> 	cd "dir-$h" || exit 1
> 	for i in $(seq 100);
> 	do
> 		mkdir "$i"
> 		for j in $(seq 100);
> 		do
> 			echo test > "$i/$j.txt"
> 		done
> 	done
> 	)&
> done
> 
> 





More information about the pve-devel mailing list