[PVE-User] I/O performance regression with kernel 4.15.x, ceph 12.2.x

Uwe Sauter uwe.sauter.de at gmail.com
Wed May 16 14:27:05 CEST 2018


Hi all,

I was able to investigate further and it comes down to this:

The blocking I/O inside VMs was caused by "X slow requests are blocked > 32 sec" on at least one OSD. I was only able to observe
this when the cluster was running on kernel 4.15.17. If the cluster is running on 4.13.16 then no blocking OSDs happen (as far as
I have seen until now).


Has anyone seen repeatedly OSDs with blocked requests when running 4.15.17 or is it just me?

Regards,

	Uwe


Am 09.05.2018 um 11:51 schrieb Uwe Sauter:
> Hi,
> 
> since kernel 4.15.x was released in pve-nosubscription I have I/O performance
> regressions that lead to 100% iowait in VMs, dropped (audit) log records and
> instability in general.
> 
> All VMs that present this behavior run up-to-date CentOS 7 on Ceph-backed storage
> with kvm64 as CPU.
> 
> This behavior presents itself if one or more hosts are running kernel 4.15.x
> (I tried 4.15.15 and 4.15.17) which lets me conclude that this must be related
> to a combination of this kernel and Ceph (and not to the Meltdown/Spectre
> patches that are included in those kernels).
> Once all hosts are booted back into running kernel 4.13.16 the situation
> calms down almost immediately and VMs go back to running with low-percentage iowait.
> 
> VM kernels have not been changed in the two weeks since 4.15.x was released.
> 
> I played around with the "PCID" cpu flag for the VMs but cannot say if this had
> any positive or negative effect on the issue.
> 
> 
> Does anyone else see this behavior?
> 
> Any suggestions on further debugging?
> 
> Thanks,
> 
>   Uwe
> 
> 
> ####### hardware ########
> 4x dual-socket Xeon E5-2670 (Sandybridge), 64GB RAM, 3 Ceph OSD disks
> 2x dual-socket Xeon E5606   (Westmere),    96GB RAM, 6 Ceph OSD disks
> 10GbE connection between all hosts
> #########################
> 
> ######### pveversion -v #########
> proxmox-ve: 5.1-43 (running kernel: 4.13.16-2-pve)
> pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
> pve-kernel-4.13: 5.1-44
> pve-kernel-4.15: 5.1-4
> pve-kernel-4.15.17-1-pve: 4.15.17-8
> pve-kernel-4.13.16-2-pve: 4.13.16-48
> ceph: 12.2.5-pve1
> corosync: 2.4.2-pve5
> criu: 2.11.1-1~bpo90
> glusterfs-client: 3.8.8-1
> ksm-control-daemon: 1.2-2
> libjs-extjs: 6.0.1-2
> libpve-access-control: 5.0-8
> libpve-apiclient-perl: 2.0-4
> libpve-common-perl: 5.0-31
> libpve-guest-common-perl: 2.0-15
> libpve-http-server-perl: 2.0-8
> libpve-storage-perl: 5.0-21
> libqb0: 1.0.1-1
> lvm2: 2.02.168-pve6
> lxc-pve: 3.0.0-3
> lxcfs: 3.0.0-1
> novnc-pve: 0.6-4
> proxmox-widget-toolkit: 1.0-17
> pve-cluster: 5.0-27
> pve-container: 2.0-22
> pve-docs: 5.1-17
> pve-firewall: 3.0-8
> pve-firmware: 2.0-4
> pve-ha-manager: 2.0-5
> pve-i18n: 1.0-4
> pve-libspice-server1: 0.12.8-3
> pve-qemu-kvm: 2.11.1-5
> pve-xtermjs: 1.0-3
> qemu-server: 5.0-25
> smartmontools: 6.5+svn4324-1
> spiceterm: 3.0-5
> vncterm: 1.5-3
> zfsutils-linux: 0.7.8-pve1~bpo9
> #################################
> 




More information about the pve-user mailing list