[PVE-User] Ceph: Some trouble creating OSD with journal on a sotware raid device...
Marco Gaiarin
gaio at sv.lnf.it
Thu Oct 13 12:13:18 CEST 2016
I'm a bit confused.
I'm trying to create 4 OSD on a server, where the SO reside on a
raid-1. On the same (couple of) disk there's 4 50MB partition for
the journal (the two disks are SSD).
Better with a command:
root at vedovanera:~# blkid
/dev/sdf1: UUID="75103d23-83a6-9f5d-eb1e-f021e729041b" UUID_SUB="70aa73ab-585c-5df1-dfef-bbc847766504" LABEL="vedovanera:0" TYPE="linux_raid_member" PARTUUID="180187f1-01"
/dev/sdf2: UUID="e21df9d5-3230-f991-d70d-f948704a7594" UUID_SUB="fe105e88-252a-97ab-d543-c2c2a89499d0" LABEL="vedovanera:1" TYPE="linux_raid_member" PARTUUID="180187f1-02"
/dev/sdf5: UUID="ba35e389-814d-dc29-8818-c9e86d9d8f08" UUID_SUB="66104965-2a53-c142-43fb-1bc35f66bf41" LABEL="vedovanera:2" TYPE="linux_raid_member" PARTUUID="180187f1-05"
/dev/sdf6: UUID="90778432-b426-51e9-b0a2-48d76ef24364" UUID_SUB="00db7dd7-f0fd-ea54-e52f-0a8725ed7866" LABEL="vedovanera:3" TYPE="linux_raid_member" PARTUUID="180187f1-06"
/dev/sdf7: UUID="09be7173-4edc-1e14-5e06-dfdcd677943c" UUID_SUB="876a79e0-be59-6153-cede-97aefcdec849" LABEL="vedovanera:4" TYPE="linux_raid_member" PARTUUID="180187f1-07"
/dev/sdf8: UUID="fd54393a-2969-7f9f-8e29-f4120dc4ab00" UUID_SUB="d576b4c8-dfc5-8ddd-25f2-9b0da0c7241c" LABEL="vedovanera:5" TYPE="linux_raid_member" PARTUUID="180187f1-08"
/dev/sda: PTUUID="cf6dccb4-4f6f-472a-9f1a-5945de4f1703" PTTYPE="gpt"
/dev/sdc: PTUUID="3ecc2e48-b12d-4cb1-add8-87f0e611b7e8" PTTYPE="gpt"
/dev/sde1: UUID="75103d23-83a6-9f5d-eb1e-f021e729041b" UUID_SUB="ab4416c0-a715-ef87-466a-6a58096eb2b9" LABEL="vedovanera:0" TYPE="linux_raid_member" PARTUUID="03210f34-01"
/dev/sde2: UUID="e21df9d5-3230-f991-d70d-f948704a7594" UUID_SUB="2355caea-4102-7269-38be-22779790c388" LABEL="vedovanera:1" TYPE="linux_raid_member" PARTUUID="03210f34-02"
/dev/sde5: UUID="ba35e389-814d-dc29-8818-c9e86d9d8f08" UUID_SUB="b3211065-8c5d-3fa5-8f57-2a50ef461a34" LABEL="vedovanera:2" TYPE="linux_raid_member" PARTUUID="03210f34-05"
/dev/sde6: UUID="90778432-b426-51e9-b0a2-48d76ef24364" UUID_SUB="296a78cf-0e97-62f6-d136-cefb9abffa3e" LABEL="vedovanera:3" TYPE="linux_raid_member" PARTUUID="03210f34-06"
/dev/sde7: UUID="09be7173-4edc-1e14-5e06-dfdcd677943c" UUID_SUB="36667e33-d801-c114-cb59-8770b66fc98d" LABEL="vedovanera:4" TYPE="linux_raid_member" PARTUUID="03210f34-07"
/dev/sde8: UUID="fd54393a-2969-7f9f-8e29-f4120dc4ab00" UUID_SUB="b5eac45a-2693-e2c5-3e00-2b8a33658a00" LABEL="vedovanera:5" TYPE="linux_raid_member" PARTUUID="03210f34-08"
/dev/sdb: PTUUID="000e025c" PTTYPE="dos"
/dev/md0: UUID="a751e134-b3ed-450c-b694-664d80f07c68" TYPE="ext4"
/dev/sdd: PTUUID="000b1250" PTTYPE="dos"
/dev/md1: UUID="8bd0c899-0317-4d20-a781-ff662e92b0b1" TYPE="swap"
/dev/md2: PTUUID="a7eb14f0-d2f9-4552-8e2d-b5165e654ea8" PTTYPE="gpt"
/dev/md3: PTUUID="ba4073c3-fab2-41e9-9612-28d28ae6468d" PTTYPE="gpt"
/dev/md4: PTUUID="c3dfbbfa-28da-4bc8-88fd-b49785e7e212" PTTYPE="gpt"
/dev/md5: PTUUID="c616bbf8-41f0-4e62-b77f-b0e8eeb624e2" PTTYPE="gpt"
'md0' is /, 'md1' the swap, md2-5 the cache partition, sda-d the disks
for OSDs.
The proxmox correctly see the 4 OSD candidate disks, but does not see the
journal partition. So i've used commandline:
root at vedovanera:~# pveceph createosd /dev/sda --journal_dev /dev/md2
command '/sbin/zpool list -HPLv' failed: open3: exec of /sbin/zpool list -HPLv failed at /usr/share/perl5/PVE/Tools.pm line 409.
create OSD on /dev/sda (xfs)
using device '/dev/md2' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sda1 isize=2048 agcount=4, agsize=122094597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
so, seems all went well... but OSD does not show up on web interface, but seems
''counted'' (i've 2 OSD working on another server):
root at vedovanera:~# ceph -s
cluster 8794c124-c2ec-4e81-8631-742992159bd6
health HEALTH_WARN
64 pgs degraded
64 pgs stale
64 pgs stuck degraded
64 pgs stuck stale
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
noout flag(s) set
monmap e2: 2 mons at {0=10.27.251.7:6789/0,1=10.27.251.8:6789/0}
election epoch 6, quorum 0,1 0,1
osdmap e29: 3 osds: 2 up, 2 in
flags noout
pgmap v42: 64 pgs, 1 pools, 0 bytes data, 0 objects
67200 kB used, 3724 GB / 3724 GB avail
64 stale+active+undersized+degraded
the previous command create also a partition (of 5GB) on the md2:
root at vedovanera:~# blkid | grep md2
/dev/md2: PTUUID="a7eb14f0-d2f9-4552-8e2d-b5165e654ea8" PTTYPE="gpt"
/dev/md2p1: PARTLABEL="ceph journal" PARTUUID="d1ccfdb2-539e-4e6a-ad60-be100304832b"
Now, if i destroy the OSD:
root at vedovanera:~# pveceph destroyosd 2
destroy OSD osd.2
/etc/init.d/ceph: osd.2 not found (/etc/pve/ceph.conf defines mon.0 mon.1, /var/lib/ceph defines )
command 'setsid service ceph -c /etc/pve/ceph.conf stop osd.2' failed: exit code 1
Remove osd.2 from the CRUSH map
Remove the osd.2 authentication key.
Remove OSD osd.2
Unmount OSD osd.2 from /var/lib/ceph/osd/ceph-2
umount: /var/lib/ceph/osd/ceph-2: mountpoint not found
command 'umount /var/lib/ceph/osd/ceph-2' failed: exit code 32
delete the /dev/md2p1 partition and recreate (type Linux) of 50GB, zap the sda disk,
and i redo the OSD creation, it works, with some strange ''warning'':
root at vedovanera:~# pveceph createosd /dev/sda --journal_dev /dev/md2p1
command '/sbin/zpool list -HPLv' failed: open3: exec of /sbin/zpool list -HPLv failed at /usr/share/perl5/PVE/Tools.pm line 409.
create OSD on /dev/sda (xfs)
using device '/dev/md2p1' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
WARNING:ceph-disk:Journal /dev/md2p1 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sda1 isize=2048 agcount=4, agsize=122094597 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
Now OSD show up on pve web interface, and seems to work as expected.
I've also tried to ''reformat'' the jounal, eg stop the OSD, flush ad recreate:
root at vedovanera:~# ceph-osd -i 2 --flush-journal
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
2016-10-13 12:06:14.209250 7ffb7c596880 -1 flushed journal /var/lib/ceph/osd/ceph-2/journal for object store /var/lib/ceph/osd/ceph-2
root at vedovanera:~# ceph-osd -i 2 --mkjournal
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
2016-10-13 12:06:45.034323 7f774cef7880 -1 created new journal /var/lib/ceph/osd/ceph-2/journal for object store /var/lib/ceph/osd/ceph-2
OSD restart correctly, but i'm still in doubt i'm doing something
wrong...
Thanks.
--
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
Associazione ``La Nostra Famiglia'' http://www.lanostrafamiglia.it/
Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento (PN)
marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f +39-0434-842797
Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
More information about the pve-user
mailing list