[PVE-User] Local XFS storage pool - Fragmentation issues

Emmanuel Kasper e.kasper at proxmox.com
Thu Oct 19 10:07:20 CEST 2017

On 10/18/2017 02:03 PM, Markos Vakondios wrote:
> Hello List,
> My use case, is to provide a large hardware-RAID backed volume (hence no
> ZFS) to PVE for continuous decent performance writes by a running container
> (cctv dvr).
> I  have 6 x 6TB SATA drives on a hardware RAID 10 configuration.
> It 's formated as a ~10TB XFS device:
> /dev/sdb on /10TB type xfs (rw,relatime,attr2,inode64,noquota)
> I created a local directory storage pool on my PVE host and assigned a
> 9.8TB mount point  to a running container (DVR)
> pve:/10TB/images/101# ls -lah /10TB/images/101/
> total 6.3T
> drwxr----- 2 root root   30 Feb 15  2017 .
> drwxr-xr-x 4 root root   38 Feb 15  2017 ..
> -rw-r----- 1 root root 9.8T Oct 18 14:03 vm-101-disk-1.raw
> After a couple of months, I started receiving the following XFS errors on
> the host:
> Oct 17 06:26:09 pve kernel: [5629844.259800] XFS: loop0(28500) possible
> memory allocation deadlock size 72832 in kmem_alloc (mode:0x2400240)
> leaving me with an unusable filesystem from inside the container (no
> writes),  unless I drop the host's page cache:
> echo 1 > /proc/sys/vm/drop_caches
> or even need to drop all pagecache, dentries and inodes:
> echo 3 > /proc/sys/vm/drop_caches (remember to sync first!)

I can't comment on the fragmentation issue because I am using ext4 in
all my testlab machines, but if you notice improvements when forcing a
cache fluche, you should use sysctl tunable to prevent page cache and
dentrie/ inode to grow to much.

I have seen this behaviour on Linux servers with a high amount of RAM
and mechanical disks. Page cache / dentries cache go up to half of the
server memory,
Then it can happen that the kernel tries to reclaim 8GB or more at once,
flushing all that data on the disks with the highest priority, provoking
a big IO wait at the kernel level which stalls the system.

To avoid this situation you can have a look

and follow the hints for "improve latency for interactive system"
for dirty_ration, dirty_background_ration, and vfs_cache_pressure

The hints from this page will have the effect of making the kernel
flushes often a smaller cache, increasing the reponsiveness of a system
with slow hard disks.

More information about the pve-user mailing list