[PVE-User] High write iops rate from idle VMs running on a PVE cluster with ceph storage
Eneko Lacunza
elacunza at binovo.es
Mon Jul 12 09:42:57 CEST 2021
Hi Rainer,
El 12/7/21 a las 8:53, Rainer Krienke escribió:
> Hello,
>
> I run a 5 node PVE cluster with pve-manager/6.4-8/185e14db (running
> kernel: 5.4.119-1-pve). The storage backend is a HDD based "external"
> ceph cluster running Ceph 14.2.16 with 144 OSDs on 9 hosts. Currently
> there are about 70 VMs running on this PVE cluster, all Linux (Ubuntu,
> SLES).
>
> The problem I have is that writing on VMS has become slower and slower
> over time and eg running linux updates (eg apt upgrade) on the VMS
> takes longer and longer. The reason seams to be a steadily rising
> write IOPs rate on the storage side. Of course over time the number of
> VNMs also increased up to the current number causing higher numbers.
>
> Over the week day I can see rates on the ceph side of up to 1000
> IOPS/sec writing and about 300 IOPS/sec reading. The really stange
> thing is however that even at weekends where the services the VMs
> offer are hardly used at all, there is still a quite high write IOPS
> rate of about 400/sec whereas the read rate is only about 50 IOPS/sec
> then. The Bytes read/written are minimal at this time with only about
> 100KBytes read/sec and about 5MBytes write/sec.
I don't think you should have I/O problems with a Ceph cluster with 144
OSDs and 9 hosts if they are healthy, you should be able to perform more
than that. I'd suspect of some host or OSDs performing poorly that break
whole cluster's performance...
>
> So what I am looking for is by what the "always there" write IOPS-Rate
> of about 400 could be caused. My guess is that this could be caused
> by file time (mtime,ctime,atime) write updates to the VMs
> filesystems. If this was true then using lazytime in /etc/fstab on
> all VMs could help to avoid this behaviour.
>
> But on the other hand all VMs use the (safe) "Writeback"-cache
> setting. So shouldn't this cache mode also cache writes caused by
> updates for file times?
>
> If yes, than I have to look for other reasons for my write IOPS
> problem allthough I have no idea about this at the moment. Any
> suggestions?
We have a cluster with 62 VMs running (mostly Linux but also some
Windows), I'm seeing right now 5-15MB/s read and 5-35MB/s writes, with
IOPS ~500 read and ~200 writes. This is with two pools, one 4 SSD OSD
based and the other 11 HDD OSDs. HDD pool has 45 VMs running on it and
apt upgrade performance is good...
Cheers
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 | https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
More information about the pve-user
mailing list