[PVE-User] Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous

Fri Jul 21 23:22:12 CEST 2017

Hi,
after some investigations (and getting the cluster back) , seems that 
i've got an issue with pveceph creating OSD:

   ceph-disk zap /dev/sdc

   pveceph createosd /dev/sdc -bluestore 0 -fstype xfs
	Unknown option: bluestore
	According to the doc (1) , it should be OK.

	pveceph createosd /dev/sdc -fstype xfs
--> Should be using Filestore, that's on purpose.

create OSD on /dev/sdc (xfs)
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=6400 blks
          =                       sectsz=512   attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=0, 
rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=864, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
The operation has completed successfully.

But OSD never gets into GUI, neither in crushmap :

ceph osd tree

ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.47186 root default
-2 3.17259     host H1
  1 1.81360         osd.1          up  1.00000          1.00000
  3 1.35899         osd.3          up  1.00000          1.00000
-3 0.67699     host H2
  0 0.67699         osd.0          up  1.00000          1.00000
-4 1.80869     host H3
  2 0.44969         osd.2          up  1.00000          1.00000
  4 1.35899         osd.4          up  1.00000          1.00000
-5 1.81360     host H4
  5 1.81360         osd.5          up  1.00000          1.00000
  6       0 osd.6                down        0          1.00000

The osd.6 shloud appear under H5.

Thanks
Best regards

(1) : https://pve.proxmox.com/pve-docs/pveceph.1.html

Le 15/07/2017 à 16:02, Phil Schwarz a écrit :
> Hi,
>
> short version :
> I broke my cluster !
>
> Long version , with context:
> With a 4 nodes Proxmox Cluster
> The nodes are all Pproxmox 5.05+Ceph luminous with filestore
> -3 mon+OSD
> -1 LXC+OSD
>
> Was working fine
> Added a fifth node (proxmox+ceph) today a broke everything..
>
> Though every node can ping each other, the web GUI is full of red
> crossed nodes. No LXC is seen though there up and alive.
> However, every other proxmox is manageable through the web GUI....
>
> In logs, i've tons of same message on 2 over 3 mons :
>
> " failed to decode message of type 80 v6: buffer::malformed_input: void
> pg_history_t::decode(ceph::buffer::list::iterator&) unknown encoding
> version > 7"
>
> Thanks for your answers.
> Best regards
>
> While investigating, i wondered about my config :
> Question relative to /etc/hosts file :
> Should i use private_replication_LAN Ip or public ones ?
> _______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user