[PVE-User] Ceph bluestore OSD Journal/DB disk size

Eneko Lacunza elacunza at binovo.es
Wed May 29 10:30:33 CEST 2019


Hi all,

I have noticed that our office Proxmox cluster has a Bluestore OSD with 
a very small db partition. This OSD was created from GUI on 12th march 
this year:

This node has 4 OSDs:
- osd.12: bluestore, all SSD
- osd.3: bluestore, SSD db + spinning
- osd.2: filestore, SSD journal + spinning
- osd.4: filestore, SSD journal + spinning

We have two pools in the cluster (SSD and HDD).

I see that for osd.3 block.db points to /dev/sdb8, which is 1G in size:

# lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda            8:0    0   1,8T  0 disk
├─sda1         8:1    0   100M  0 part /var/lib/ceph/osd/ceph-12
└─sda2         8:2    0   1,8T  0 part
sdb            8:16   0 931,5G  0 disk
├─sdb1         8:17   0   100M  0 part /var/lib/ceph/osd/ceph-3
└─sdb2         8:18   0 931,4G  0 part
sdc            8:32   0 931,5G  0 disk
└─sdc1         8:33   0 931,5G  0 part /var/lib/ceph/osd/ceph-4
sdd            8:48   0 931,5G  0 disk
└─sdd1         8:49   0 931,5G  0 part /var/lib/ceph/osd/ceph-2
sde            8:64   0 186,3G  0 disk
├─sde1         8:65   0  1007K  0 part
├─sde2         8:66   0   127M  0 part /boot/efi
├─sde3         8:67   0  59,9G  0 part
│ ├─pve-root 253:0    0  18,6G  0 lvm  /
│ └─pve-swap 253:1    0   952M  0 lvm  [SWAP]
├─sde5         8:69   0     5G  0 part
├─sde6         8:70   0     5G  0 part
└─sde8         8:72   0     1G  0 part

This was created from GUI. I see that currently GUI doesn't allow to 
specify journal/DB partition size... (I can't test all the process until 
creation...)

I think 1GB may be too small for a default value, and that it could be 
preventing the full db to be placed in that partition, as per ceph-users 
mailing list messages:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030740.html
https://www.spinics.net/lists/ceph-devel/msg39315.html

Maybe 3GB would be a better default? Also it seems that for not very 
dense OSD nodes 30GB (or whatever is the next level) would be feasible too.

I see the following in a perf dump of osd.3
     "bluefs": {
         "gift_bytes": 0,
         "reclaim_bytes": 0,
         "db_total_bytes": 1073733632,
         "db_used_bytes": 664797184,
         "wal_total_bytes": 0,
         "wal_used_bytes": 0,
         "slow_total_bytes": 40004222976,
         "slow_used_bytes": 1228931072,
         "num_files": 19,
         "log_bytes": 1318912,
         "log_compactions": 1,
         "logged_bytes": 164077568,
         "files_written_wal": 2,
         "files_written_sst": 17,
         "bytes_written_wal": 1599916960,
         "bytes_written_sst": 752941742
     },

So, 665MB used of db partition, and 1.2GB of additional data in slow 
storage...

Thanks a lot
Eneko

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es




More information about the pve-user mailing list