[PVE-User] How to configure the best for CEPH

Jean-Laurent Ivars jl.ivars at ipgenius.fr
Wed Mar 16 20:39:44 CET 2016

Hello everyone,

At first, I am sorry if my english is not very good but hopefully you are going to understand what i say (you know french people reputation…).

I would be very happy if some people take the time to read this email, an even more if i get answers.

I have a 2 host cluster setup with ZFS and replicated on each other with pvesync script among other things and my VMs are running on these hosts for now but I am impatient to be able to migrate on my new infrastructure. I decided to change my infrastructure because I really would like to take advantage of CEPH for replication, expanding abilities, live migration and even maybe high availability setup.

After having read a lot of documentations/books/forums, I decided to go with CEPH storage which seem to be the way to go for me.

My servers are hosted by OVH and from what I read, and with the budget I have, the best options with CEPH storage in mind seemed to be the following servers : https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H <https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H> 
With the following storage options : No HW Raid, 2X300Go SSD and 2X2To HDD

One of the reasons I choose these models is the 10Gb VRACK option and I understood that CEPH needs a fast network to be efficient. Of course in a perfect world, the best would be to have a lot of disks for OSDs, two more SSD for my system and 2 10Gb bonded NIC but this is the most approaching I can afford in the OVH product range.

I already made the install of the cluster and set different VLANs for cluster and storage. Set the hosts files and installed CEPH. Everything went seamless except the fact that OVH installation create a MBR install on the SSD and CEPH needs a GPT one but I managed to convert the partition tables so now, I though I was all set for CEPH configuration.

For now, my partitioning scheme is the following : (message rejected because too big for mailing list so there is a link)  https://www.ipgenius.fr/tools/pveceph.png <https://www.ipgenius.fr/tools/pveceph.png>

I know that it would be better to give CEPH the whole disks but I have to put my system somewhere… I was thinking that even if it’s not the best (i can’t afford more), these settings would work… So I have tried to give CEPH the OSDs with my SSD journal partition with the appropriate command but it didn’t seem to work and I assume it's because CEPH don’t want partitions but entire hard drive…

root at pvegra1 ~ # pveceph createosd /dev/sdc -journal_dev /dev/sda4
create OSD on /dev/sdc (xfs)
using device '/dev/sda4' for journal
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
WARNING:ceph-disk:Journal /dev/sda4 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=122094597 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=488378385, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=238466, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.

I saw the following threads : 
https://forum.proxmox.com/threads/ceph-server-feedback.17909/ <https://forum.proxmox.com/threads/ceph-server-feedback.17909/>  
https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/ <https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/>

But this kind of setting seem to suffer performance issue and It’s not officially supported and I am not feeling very well with that because at the moment, I only took community subscription from Proxmox but I want to be able to move on a different plan to get support from them if I need it and if I go this way, I’m afraid they will say me it’s a non supported configuration.

OVH can provide USB keys so I could install the system on it and get my whole disks for CEPH, but I think it is not supported too. Moreover, I fear for performances and stability in the time with this solution.

Maybe I could use one SSD for the system and journal partitions (but again it’s a mix not really supported) and the other SSD dedicated to CEPH… but with this solution I loose my system RAID protection… and a lot of SSD space...

I’m a little bit confused about the best partitioning scheme and how to manage to obtain a stable, supported, which the less space lost and performant configuration.

Should I continue with my partitioning scheme even if it’s not the best supported, it seem the most appropriate in my case or do I need to completing rethink my install ?

Please can someone give me advice, I’m all yours :)
Thanks a lot for anyone taking the time to read this mail and giving me good advices.

P.S. If someone from the official proxmox support team sees this message can you tell me If I buy a subscription with ticket if I can be assisted on this kind of question ?  And if I buy a subscription, I will ask help to configure CEPH for the best too, SSD pool, normal speed pool, how to set redundancy, how to make snapshots, how to make backups and so on and so on… is it the kind of things you can help me with ?

Jean-Laurent Ivars 
Responsable Technique | Technical Manager
22, rue Robert - 13007 Marseille 
Tel: 09 84 56 64 30 - Mobile: 
Linkedin <http://fr.linkedin.com/in/jlivars/>   |  Viadeo <http://www.viadeo.com/fr/profile/jean-laurent.ivars>   |  www.ipgenius.fr <https://www.ipgenius.fr/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pve.proxmox.com/pipermail/pve-user/attachments/20160316/963d8264/attachment-0014.html>

More information about the pve-user mailing list