[PVE-User] Spillover issue
Alwin Antreich
a.antreich at proxmox.com
Wed Mar 25 11:55:12 CET 2020
On Wed, Mar 25, 2020 at 08:43:41AM +0100, Eneko Lacunza wrote:
> Hi Alwin,
>
> El 24/3/20 a las 14:54, Alwin Antreich escribió:
> > On Tue, Mar 24, 2020 at 01:12:03PM +0100, Eneko Lacunza wrote:
> > > Hi Allwin,
> > >
> > > El 24/3/20 a las 12:24, Alwin Antreich escribió:
> > > > On Tue, Mar 24, 2020 at 10:34:15AM +0100, Eneko Lacunza wrote:
> > > > > We're seeing a spillover issue with Ceph, using 14.2.8:
> > > [...]
> > > > > 3. ceph health detail
> > > > > HEALTH_WARN BlueFS spillover detected on 3 OSD
> > > > > BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
> > > > > osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
> > > > > 6.0 GiB) to slow device
> > > > > osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
> > > > > 6.0 GiB) to slow device
> > > > > osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used of
> > > > > 6.0 GiB) to slow device
> > > > >
> > > > > I may be overlooking something, any idea? Just found also the following ceph
> > > > > issue:
> > > > >
> > > > > https://tracker.ceph.com/issues/38745
> > > > >
> > > > > 5MiB of metadata in slow isn't a big problem, but cluster is permanently in
> > > > > health Warning state... :)
> > > > The DB/WAL device is to small and all the new metadata has to be written
> > > > to the slow device. This will destroy performance.
> > > >
> > > > I think the size changes, as the DB gets compacted.
> > > Yes. But it isn't too small... it's 6 GiB and there's only ~560MiB of data.
> > Yes true. I meant the used of size. But the message is oddly.
> >
> > You should find the compaction stats in the OSD log files. It could be,
> > as in the bug tracker reasoned, that the compaction needs to much space
> > and spills over to the slow device. Addionally, if no set extra, the WAL
> > will take up 512 MB on the DB device.
> I don't see any indication that compaction needs too much space:
>
> 2020-03-24 14:24:04.883 7f03ffbee700 4 rocksdb: [db/db_impl.cc:777] -------
> DUMPING STATS -------
> 2020-03-24 14:24:04.883 7f03ffbee700 4 rocksdb: [db/db_impl.cc:778]
> ** DB Stats **
> Uptime(secs): 15000.1 total, 600.0 interval
> Cumulative writes: 4646 writes, 18K keys, 4646 commit groups, 1.0 writes per
> commit group, ingest: 0.01 GB, 0.00 MB/s
> Cumulative WAL: 4646 writes, 1891 syncs, 2.46 writes per sync, written: 0.01
> GB, 0.00 MB/s
> Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
> Interval writes: 163 writes, 637 keys, 163 commit groups, 1.0 writes per
> commit group, ingest: 0.63 MB, 0.00 MB/s
> Interval WAL: 163 writes, 67 syncs, 2.40 writes per sync, written: 0.00 MB,
> 0.00 MB/s
> Interval stall: 00:00:0.000 H:M:S, 0.0 percent
>
> ** Compaction Stats [default] **
> Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB)
> Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt)
> Avg(sec) KeyIn KeyDrop
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> L0 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 1.0 0.0 33.4 0.02 0.00 2 0.009
> 0 0
> L1 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 0.8 162.1 134.6 0.09 0.06 1 0.092
> 127K 10K
> L2 9/0 538.64 MB 0.2 0.5 0.0 0.5 0.5 0.0
> 0.0 43.6 102.7 101.2 5.32 1.31 1 5.325
> 1496K 110K
> Sum 9/0 538.64 MB 0.0 0.5 0.0 0.5 0.5 0.0
> 0.0 961.1 103.3 101.5 5.43 1.37 4 1.358
> 1623K 121K
> Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000
> 0 0
>
> ** Compaction Stats [default] **
> Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB)
> Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec)
> Comp(cnt) Avg(sec) KeyIn KeyDrop
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Low 0/0 0.00 KB 0.0 0.5 0.0 0.5 0.5 0.0
> 0.0 0.0 103.7 101.7 5.42 1.36 2 2.708
> 1623K 121K
> High 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 0.0 0.0 43.9 0.01 0.00 1 0.013
> 0 0
> User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 0.0 0.0 0.4 0.00 0.00 1 0.004
> 0 0
> Uptime(secs): 15000.1 total, 600.0 interval
> Flush(GB): cumulative 0.001, interval 0.000
> AddFile(GB): cumulative 0.000, interval 0.000
> AddFile(Total Files): cumulative 0, interval 0
> AddFile(L0 Files): cumulative 0, interval 0
> AddFile(Keys): cumulative 0, interval 0
> Cumulative compaction: 0.54 GB write, 0.04 MB/s write, 0.55 GB read, 0.04
> MB/s read, 5.4 seconds
> Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s
> read, 0.0 seconds
> Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0
> level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for
> pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0
> memtable_compaction, 0 memtable_slowdown, interval 0 total count
>
> I see the following in a perf dump:
>
> "bluefs": {
> "gift_bytes": 0,
> "reclaim_bytes": 0,
> "db_total_bytes": 6442442752,
> "db_used_bytes": 696246272,
> "wal_total_bytes": 0,
> "wal_used_bytes": 0,
> "slow_total_bytes": 40004222976,
> "slow_used_bytes": 5242880,
> "num_files": 20,
> "log_bytes": 41631744,
> "log_compactions": 0,
> "logged_bytes": 40550400,
> "files_written_wal": 2,
> "files_written_sst": 41,
> "bytes_written_wal": 102040973,
> "bytes_written_sst": 2233090674,
> "bytes_written_slow": 0,
> "max_bytes_wal": 0,
> "max_bytes_db": 1153425408,
> "max_bytes_slow": 0,
> "read_random_count": 127832,
> "read_random_bytes": 2761102524,
> "read_random_disk_count": 19206,
> "read_random_disk_bytes": 2330400597,
> "read_random_buffer_count": 108844,
> "read_random_buffer_bytes": 430701927,
> "read_count": 21457,
> "read_bytes": 1087948189,
> "read_prefetch_count": 21438,
> "read_prefetch_bytes": 1086853927
> },
>
>
> > If the above doesn't give any information then you may need to export
> > the bluefs (RocksDB). Then you can run the kvstore-tool on it.
> I'll look to try this, although I'd say it's some kind of bug.
> >
> > > > The easiest way ist to destroy and re-create the OSD with a bigger
> > > > DB/WAL. The guideline from Facebook for RocksDB is 3/30/300 GB.
> > > It's well below the 3GiB limit in the guideline ;)
> > For now. ;)
> Cluster has 2 years now, data amount is quite stable, I think it will hold
> for some time ;)
Hm... Igor recons that this seems to be normal.
https://tracker.ceph.com/issues/38745#note-28
--
Cheers,
Alwin
More information about the pve-user
mailing list