Eneko Lacunza elacunza at binovo.es
Mon Jul 12 09:42:57 CEST 2021

El 12/7/21 a las 8:53, Rainer Krienke escribió:
> I run a 5 node PVE cluster with pve-manager/6.4-8/185e14db (running 
> kernel: 5.4.119-1-pve). The storage backend is a HDD based "external" 
> ceph cluster running Ceph 14.2.16 with 144 OSDs on 9 hosts. Currently 
> there are about 70 VMs running on this PVE cluster, all Linux (Ubuntu, 
> SLES).
> The problem I have is that writing on VMS has become slower and slower 
> over time and eg running linux updates (eg apt upgrade) on the VMS 
> takes longer and longer. The reason seams to be a steadily rising 
> write IOPs rate on the storage side. Of course over time the number of 
> VNMs also increased up to the current number causing higher numbers.
> Over the week day I can see rates on the ceph side of up to 1000 
> IOPS/sec writing and about 300 IOPS/sec reading. The really stange 
> thing is however that even at weekends where the services the VMs 
> offer are hardly used at all, there is still a quite high write IOPS 
> rate of about 400/sec whereas the read rate is only about 50 IOPS/sec 
> then. The Bytes read/written are minimal at this time with only about 
> 100KBytes read/sec and  about 5MBytes write/sec.

I don't think you should have I/O problems with a Ceph cluster with 144 
OSDs and 9 hosts if they are healthy, you should be able to perform more 
than that. I'd suspect of some host or OSDs performing poorly that break 
whole cluster's performance...

> So what I am looking for is by what the "always there" write IOPS-Rate 
> of about 400 could be caused.  My guess is that this could be caused 
> by file time (mtime,ctime,atime) write updates to the VMs 
> filesystems.  If this was true then using lazytime in /etc/fstab on 
> all VMs could help to avoid this behaviour.
> But on the other hand all VMs use the (safe) "Writeback"-cache 
> setting. So shouldn't this cache mode also cache writes caused by 
> updates for file times?
> If yes, than I have to look for other reasons for my write IOPS 
> problem allthough I have no idea about this at the moment.  Any 
> suggestions?

We have a cluster with 62 VMs running (mostly Linux but also some 
Windows), I'm seeing right now 5-15MB/s read  and 5-35MB/s writes, with 
IOPS ~500 read and ~200 writes. This is with two pools, one 4 SSD OSD 
based and the other 11 HDD OSDs. HDD pool has 45 VMs running on it and 
apt upgrade performance is good...


