[PVE-User] High write iops rate from idle VMs running on a PVE cluster with ceph storage

Mon Jul 12 19:12:59 CEST 2021

You can try to look at something like 'ceph osd perf' and look for outliers.

On Mon, Jul 12, 2021 at 12:43 AM Eneko Lacunza via pve-user <
pve-user at lists.proxmox.com> wrote:

>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza <elacunza at binovo.es>
> To: pve-user at lists.proxmox.com
> Cc:
> Bcc:
> Date: Mon, 12 Jul 2021 09:42:57 +0200
> Subject: Re: [PVE-User] High write iops rate from idle VMs running on a
> PVE cluster with ceph storage
> Hi Rainer,
>
> El 12/7/21 a las 8:53, Rainer Krienke escribió:
> > Hello,
> >
> > I run a 5 node PVE cluster with pve-manager/6.4-8/185e14db (running
> > kernel: 5.4.119-1-pve). The storage backend is a HDD based "external"
> > ceph cluster running Ceph 14.2.16 with 144 OSDs on 9 hosts. Currently
> > there are about 70 VMs running on this PVE cluster, all Linux (Ubuntu,
> > SLES).
> >
> > The problem I have is that writing on VMS has become slower and slower
> > over time and eg running linux updates (eg apt upgrade) on the VMS
> > takes longer and longer. The reason seams to be a steadily rising
> > write IOPs rate on the storage side. Of course over time the number of
> > VNMs also increased up to the current number causing higher numbers.
> >
> > Over the week day I can see rates on the ceph side of up to 1000
> > IOPS/sec writing and about 300 IOPS/sec reading. The really stange
> > thing is however that even at weekends where the services the VMs
> > offer are hardly used at all, there is still a quite high write IOPS
> > rate of about 400/sec whereas the read rate is only about 50 IOPS/sec
> > then. The Bytes read/written are minimal at this time with only about
> > 100KBytes read/sec and  about 5MBytes write/sec.
>
> I don't think you should have I/O problems with a Ceph cluster with 144
> OSDs and 9 hosts if they are healthy, you should be able to perform more
> than that. I'd suspect of some host or OSDs performing poorly that break
> whole cluster's performance...
>
> >
> > So what I am looking for is by what the "always there" write IOPS-Rate
> > of about 400 could be caused.  My guess is that this could be caused
> > by file time (mtime,ctime,atime) write updates to the VMs
> > filesystems.  If this was true then using lazytime in /etc/fstab on
> > all VMs could help to avoid this behaviour.
> >
> > But on the other hand all VMs use the (safe) "Writeback"-cache
> > setting. So shouldn't this cache mode also cache writes caused by
> > updates for file times?
> >
> > If yes, than I have to look for other reasons for my write IOPS
> > problem allthough I have no idea about this at the moment.  Any
> > suggestions?
>
> We have a cluster with 62 VMs running (mostly Linux but also some
> Windows), I'm seeing right now 5-15MB/s read  and 5-35MB/s writes, with
> IOPS ~500 read and ~200 writes. This is with two pools, one 4 SSD OSD
> based and the other 11 HDD OSDs. HDD pool has 45 VMs running on it and
> apt upgrade performance is good...
>
> Cheers
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.es
> Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
>
> https://www.youtube.com/user/CANALBINOVO
> https://www.linkedin.com/company/37269706/
>
>
>
>
>
> ---------- Forwarded message ----------
> From: Eneko Lacunza via pve-user <pve-user at lists.proxmox.com>
> To: pve-user at lists.proxmox.com
> Cc: Eneko Lacunza <elacunza at binovo.es>
> Bcc:
> Date: Mon, 12 Jul 2021 09:42:57 +0200
> Subject: Re: [PVE-User] High write iops rate from idle VMs running on a
> PVE cluster with ceph storage
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>