[PVE-User] Spillover issue

Eneko Lacunza elacunza at binovo.es
Wed Mar 25 08:43:41 CET 2020


Hi Alwin,

El 24/3/20 a las 14:54, Alwin Antreich escribió:
> On Tue, Mar 24, 2020 at 01:12:03PM +0100, Eneko Lacunza wrote:
>> Hi Allwin,
>>
>> El 24/3/20 a las 12:24, Alwin Antreich escribió:
>>> On Tue, Mar 24, 2020 at 10:34:15AM +0100, Eneko Lacunza wrote:
>>>> We're seeing a spillover issue with Ceph, using 14.2.8:
>> [...]
>>>> 3. ceph health detail
>>>>      HEALTH_WARN BlueFS spillover detected on 3 OSD
>>>>      BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
>>>>      osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
>>>>      6.0 GiB) to slow device
>>>>      osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
>>>>      6.0 GiB) to slow device
>>>>      osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used of
>>>>      6.0 GiB) to slow device
>>>>
>>>> I may be overlooking something, any idea? Just found also the following ceph
>>>> issue:
>>>>
>>>> https://tracker.ceph.com/issues/38745
>>>>
>>>> 5MiB of metadata in slow isn't a big problem, but cluster is permanently in
>>>> health Warning state... :)
>>> The DB/WAL device is to small and all the new metadata has to be written
>>> to the slow device. This will destroy performance.
>>>
>>> I think the size changes, as the DB gets compacted.
>> Yes. But it isn't too small... it's 6 GiB and there's only ~560MiB of data.
> Yes true. I meant the used of size. But the message is oddly.
>
> You should find the compaction stats in the OSD log files. It could be,
> as in the bug tracker reasoned, that the compaction needs to much space
> and spills over to the slow device. Addionally, if no set extra, the WAL
> will take up 512 MB on the DB device.
I don't see any indication that compaction needs too much space:

2020-03-24 14:24:04.883 7f03ffbee700  4 rocksdb: [db/db_impl.cc:777] 
------- DUMPING STATS -------
2020-03-24 14:24:04.883 7f03ffbee700  4 rocksdb: [db/db_impl.cc:778]
** DB Stats **
Uptime(secs): 15000.1 total, 600.0 interval
Cumulative writes: 4646 writes, 18K keys, 4646 commit groups, 1.0 writes 
per commit group, ingest: 0.01 GB, 0.00 MB/s
Cumulative WAL: 4646 writes, 1891 syncs, 2.46 writes per sync, written: 
0.01 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 163 writes, 637 keys, 163 commit groups, 1.0 writes per 
commit group, ingest: 0.63 MB, 0.00 MB/s
Interval WAL: 163 writes, 67 syncs, 2.40 writes per sync, written: 0.00 
MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   L0      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      
0.0       0.0   1.0      0.0     33.4 0.02              0.00         
2    0.009       0      0
   L1      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      
0.0       0.0   0.8    162.1    134.6 0.09              0.06         
1    0.092    127K    10K
   L2      9/0   538.64 MB   0.2      0.5     0.0      0.5 0.5      
0.0       0.0  43.6    102.7    101.2 5.32              1.31         
1    5.325   1496K   110K
  Sum      9/0   538.64 MB   0.0      0.5     0.0      0.5 0.5      
0.0       0.0 961.1    103.3    101.5 5.43              1.37         
4    1.358   1623K   121K
  Int      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      
0.0       0.0   0.0      0.0      0.0 0.00              0.00         
0    0.000       0      0

** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) 
Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) 
Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Low      0/0    0.00 KB   0.0      0.5     0.0      0.5 0.5      
0.0       0.0   0.0    103.7    101.7 5.42              1.36         
2    2.708   1623K   121K
High      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      
0.0       0.0   0.0      0.0     43.9 0.01              0.00         
1    0.013       0      0
User      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      
0.0       0.0   0.0      0.0      0.4 0.00              0.00         
1    0.004       0      0
Uptime(secs): 15000.1 total, 600.0 interval
Flush(GB): cumulative 0.001, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.54 GB write, 0.04 MB/s write, 0.55 GB read, 
0.04 MB/s read, 5.4 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 
MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 
level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for 
pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 
memtable_compaction, 0 memtable_slowdown, interval 0 total count

I see the following in a perf dump:

     "bluefs": {
         "gift_bytes": 0,
         "reclaim_bytes": 0,
         "db_total_bytes": 6442442752,
         "db_used_bytes": 696246272,
         "wal_total_bytes": 0,
         "wal_used_bytes": 0,
         "slow_total_bytes": 40004222976,
         "slow_used_bytes": 5242880,
         "num_files": 20,
         "log_bytes": 41631744,
         "log_compactions": 0,
         "logged_bytes": 40550400,
         "files_written_wal": 2,
         "files_written_sst": 41,
         "bytes_written_wal": 102040973,
         "bytes_written_sst": 2233090674,
         "bytes_written_slow": 0,
         "max_bytes_wal": 0,
         "max_bytes_db": 1153425408,
         "max_bytes_slow": 0,
         "read_random_count": 127832,
         "read_random_bytes": 2761102524,
         "read_random_disk_count": 19206,
         "read_random_disk_bytes": 2330400597,
         "read_random_buffer_count": 108844,
         "read_random_buffer_bytes": 430701927,
         "read_count": 21457,
         "read_bytes": 1087948189,
         "read_prefetch_count": 21438,
         "read_prefetch_bytes": 1086853927
     },


> If the above doesn't give any information then you may need to export
> the bluefs (RocksDB). Then you can run the kvstore-tool on it.
I'll look to try this, although I'd say it's some kind of bug.
>
>>> The easiest way ist to destroy and re-create the OSD with a bigger
>>> DB/WAL. The guideline from Facebook for RocksDB is 3/30/300 GB.
>> It's well below the 3GiB limit in the guideline ;)
> For now. ;)
Cluster has 2 years now, data amount is quite stable, I think it will 
hold for some time ;)

Thanks a lot
Eneko

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es




More information about the pve-user mailing list