[PVE-User] Spillover issue
Eneko Lacunza
elacunza at binovo.es
Tue Mar 24 10:34:15 CET 2020
Hi all,
We're seeing a spillover issue with Ceph, using 14.2.8:
We originally had 1GB rocks.db partition:
1. ceph health detail
HEALTH_WARN BlueFS spillover detected on 3 OSD
BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
osd.3 spilled over 78 MiB metadata from 'db' device (1024 MiB used
of 1024 MiB) to slow device
osd.4 spilled over 78 MiB metadata from 'db' device (1024 MiB used
of 1024 MiB) to slow device
osd.5 spilled over 84 MiB metadata from 'db' device (1024 MiB used
of 1024 MiB) to slow device
We have created new 6GiB partitions for rocks.db, copied the original
partition, then extended it with "ceph-bluestore-tool
bluefs-bdev-expand". Now we get:
1. ceph health detail
HEALTH_WARN BlueFS spillover detected on 3 OSD
BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
osd.3 spilled over 5 MiB metadata from 'db' device (555 MiB used of
6.0 GiB) to slow device
osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
6.0 GiB) to slow device
osd.5 spilled over 5 MiB metadata from 'db' device (561 MiB used of
6.0 GiB) to slow device
Issuing "ceph daemon osd.X compact" doesn't help, but shows the
following transitional state:
1. ceph daemon osd.5 compact {
"elapsed_time": 5.4560688339999999
}
2. ceph health detail
HEALTH_WARN BlueFS spillover detected on 3 OSD
BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
6.0 GiB) to slow device
osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
6.0 GiB) to slow device
osd.5 spilled over 5 MiB metadata from 'db' device (1.1 GiB used of
6.0 GiB) to slow device
(...and after a while...)
3. ceph health detail
HEALTH_WARN BlueFS spillover detected on 3 OSD
BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
6.0 GiB) to slow device
osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
6.0 GiB) to slow device
osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used of
6.0 GiB) to slow device
I may be overlooking something, any idea? Just found also the following
ceph issue:
https://tracker.ceph.com/issues/38745
5MiB of metadata in slow isn't a big problem, but cluster is permanently
in health Warning state... :)
# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
pve-kernel-4.15: 5.4-14
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-4.15.18-26-pve: 4.15.18-54
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-2-pve: 4.15.18-21
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.8-pve1
ceph-fuse: 14.2.8-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-12
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-19
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
Thanks a lot
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-user
mailing list