[PVE-User] Spillover issue
Alwin Antreich
a.antreich at proxmox.com
Tue Mar 24 12:24:23 CET 2020
Hello Eneko,
On Tue, Mar 24, 2020 at 10:34:15AM +0100, Eneko Lacunza wrote:
> Hi all,
>
> We're seeing a spillover issue with Ceph, using 14.2.8:
>
> We originally had 1GB rocks.db partition:
>
> 1. ceph health detail
> HEALTH_WARN BlueFS spillover detected on 3 OSD
> BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
> osd.3 spilled over 78 MiB metadata from 'db' device (1024 MiB used
> of 1024 MiB) to slow device
> osd.4 spilled over 78 MiB metadata from 'db' device (1024 MiB used
> of 1024 MiB) to slow device
> osd.5 spilled over 84 MiB metadata from 'db' device (1024 MiB used
> of 1024 MiB) to slow device
>
> We have created new 6GiB partitions for rocks.db, copied the original
> partition, then extended it with "ceph-bluestore-tool bluefs-bdev-expand".
> Now we get:
>
> 1. ceph health detail
> HEALTH_WARN BlueFS spillover detected on 3 OSD
> BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
> osd.3 spilled over 5 MiB metadata from 'db' device (555 MiB used of
> 6.0 GiB) to slow device
> osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
> 6.0 GiB) to slow device
> osd.5 spilled over 5 MiB metadata from 'db' device (561 MiB used of
> 6.0 GiB) to slow device
>
> Issuing "ceph daemon osd.X compact" doesn't help, but shows the following
> transitional state:
>
> 1. ceph daemon osd.5 compact {
> "elapsed_time": 5.4560688339999999
> }
> 2. ceph health detail
> HEALTH_WARN BlueFS spillover detected on 3 OSD
> BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
> osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
> 6.0 GiB) to slow device
> osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
> 6.0 GiB) to slow device
> osd.5 spilled over 5 MiB metadata from 'db' device (1.1 GiB used of
> 6.0 GiB) to slow device
> (...and after a while...)
> 3. ceph health detail
> HEALTH_WARN BlueFS spillover detected on 3 OSD
> BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
> osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
> 6.0 GiB) to slow device
> osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
> 6.0 GiB) to slow device
> osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used of
> 6.0 GiB) to slow device
>
> I may be overlooking something, any idea? Just found also the following ceph
> issue:
>
> https://tracker.ceph.com/issues/38745
>
> 5MiB of metadata in slow isn't a big problem, but cluster is permanently in
> health Warning state... :)
The DB/WAL device is to small and all the new metadata has to be written
to the slow device. This will destroy performance.
I think the size changes, as the DB gets compacted.
The easiest way ist to destroy and re-create the OSD with a bigger
DB/WAL. The guideline from Facebook for RocksDB is 3/30/300 GB.
--
Cheers,
Alwin
More information about the pve-user
mailing list