[PVE-User] How to configure the best for CEPH

Jean-Laurent Ivars jl.ivars at ipgenius.fr
Thu Mar 17 10:51:59 CET 2016

Thank you so much for your answer :)

My answers below...

Best regards,

Jean-Laurent Ivars 
Responsable Technique | Technical Manager
22, rue Robert - 13007 Marseille 
Tel: 09 84 56 64 30 - Mobile: 
Linkedin <http://fr.linkedin.com/in/jlivars/>   |  Viadeo <http://www.viadeo.com/fr/profile/jean-laurent.ivars>   |  www.ipgenius.fr <https://www.ipgenius.fr/>
> Le 17 mars 2016 à 09:50, Eneko Lacunza <elacunza at binovo.es> a écrit :
> Hi Jean-Laurent,
> El 16/03/16 a las 20:39, Jean-Laurent Ivars escribió:
>> I have a 2 host cluster setup with ZFS and replicated on each other with pvesync script among other things and my VMs are running on these hosts for now but I am impatient to be able to migrate on my new infrastructure. I decided to change my infrastructure because I really would like to take advantage of CEPH for replication, expanding abilities, live migration and even maybe high availability setup.
>> After having read a lot of documentations/books/forums, I decided to go with CEPH storage which seem to be the way to go for me.
>> My servers are hosted by OVH and from what I read, and with the budget I have, the best options with CEPH storage in mind seemed to be the following servers :  <https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H>https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H <https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H> 
>> With the following storage options : No HW Raid, 2X300Go SSD and 2X2To HDD
> About the SSD, what exact brand/model are they? I can't find this info on OVH web.

The models are INTEL SSDSC2BB30, you can find information here : https://www.ovh.com/fr/serveurs_dedies/avantages-disques-ssd.xml <https://www.ovh.com/fr/serveurs_dedies/avantages-disques-ssd.xml>
They are datacenter SSD and they have the Power Loss Imminent protection.

>> One of the reasons I choose these models is the 10Gb VRACK option and I understood that CEPH needs a fast network to be efficient. Of course in a perfect world, the best would be to have a lot of disks for OSDs, two more SSD for my system and 2 10Gb bonded NIC but this is the most approaching I can afford in the OVH product range.
> In your configuration, I doubt very much you'll be able to leverage 10Gb NICs; I have a 3node 3osd each setup in our office, with 1 gbit network, and ceph hardly uses 200-300Mbps. Maybe you have a bit lower latency, but that will be all.

Ok ! that good news :)

>> I already made the install of the cluster and set different VLANs for cluster and storage. Set the hosts files and installed CEPH. Everything went seamless except the fact that OVH installation create a MBR install on the SSD and CEPH needs a GPT one but I managed to convert the partition tables so now, I though I was all set for CEPH configuration.
>> For now, my partitioning scheme is the following : (message rejected because too big for mailing list so there is a link)   <https://www.ipgenius.fr/tools/pveceph.png>https://www.ipgenius.fr/tools/pveceph.png <https://www.ipgenius.fr/tools/pveceph.png>
> Seems quite good, maybe having a bit more room for root filesystem would be good, you have 300GB of disk... :) Also see below.

OK thank you :) about the system, I never saw more than 2GB used by proxmox installation so I think 10GB is enough...

>> I know that it would be better to give CEPH the whole disks but I have to put my system somewhere… I was thinking that even if it’s not the best (i can’t afford more), these settings would work… So I have tried to give CEPH the OSDs with my SSD journal partition with the appropriate command but it didn’t seem to work and I assume it's because CEPH don’t want partitions but entire hard drive…
>> root at pvegra1 ~ # pveceph createosd /dev/sdc -journal_dev /dev/sda4
>> create OSD on /dev/sdc (xfs)
>> using device '/dev/sda4' for journal
>> Creating new GPT entries.
>> GPT data structures destroyed! You may now partition the disk using fdisk or
>> other utilities.
>> Creating new GPT entries.
>> The operation has completed successfully.
>> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
>> WARNING:ceph-disk:Journal /dev/sda4 was not prepared with ceph-disk. Symlinking directly.
>> Setting name!
>> partNum is 0
>> REALLY setting name!
>> The operation has completed successfully.
>> meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=122094597 blks
>>          =                       sectsz=512   attr=2, projid32bit=1
>>          =                       crc=0        finobt=0
>> data     =                       bsize=4096   blocks=488378385, imaxpct=5
>>          =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>> log      =internal log           bsize=4096   blocks=238466, version=2
>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>> Warning: The kernel is still using the old partition table.
>> The new table will be used at the next reboot.
>> The operation has completed successfully.
>> I saw the following threads : 
>> https://forum.proxmox.com/threads/ceph-server-feedback.17909/ <https://forum.proxmox.com/threads/ceph-server-feedback.17909/>  
>> https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/ <https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/>
>> But this kind of setting seem to suffer performance issue and It’s not officially supported and I am not feeling very well with that because at the moment, I only took community subscription from Proxmox but I want to be able to move on a different plan to get support from them if I need it and if I go this way, I’m afraid they will say me it’s a non supported configuration.

So you aren’t « shocked » I want to use partitions instead of whole drives in my configuration ?

>> OVH can provide USB keys so I could install the system on it and get my whole disks for CEPH, but I think it is not supported too. Moreover, I fear for performances and stability in the time with this solution.
>> Maybe I could use one SSD for the system and journal partitions (but again it’s a mix not really supported) and the other SSD dedicated to CEPH… but with this solution I loose my system RAID protection… and a lot of SSD space...
>> I’m a little bit confused about the best partitioning scheme and how to manage to obtain a stable, supported, which the less space lost and performant           configuration.
>> Should I continue with my partitioning scheme even if it’s not the best supported, it seem the most appropriate in my case or do I need to completing rethink my install ?
>> Please can someone give me advice, I’m all yours :)
>> Thanks a lot for anyone taking the time to read this mail and giving me good advices.
> I suggest you only mirror swap and root partitions. Then use one SSD for earch OSD's journal.
> So to fix your problems, please try the following:
> - Remove all OSDs from Proxmox GUI (or CLI)
> - Remove journal partitions
> - Remove journal partition mirrors
> - Now we have 2 partitions on each SSD (swap and root), mirrored.
> - Create OSDs from Proxmox GUI, use a different SSD disk for journal of each OSD. If you can't do this, SSD drives don't have GPT partition.

Tank you very much for you suggestion, I am going to follow you advices, only changing one thing, as a french ml user told me, swap is not a really good idea, my system won’t really need it and if it does it would not be good for overall performances : it can cause intensive IO access so I should not add this in my setup witch is soliciting the SSD enough...

>> P.S. If someone from the official proxmox support team sees this message can you tell me If I buy a subscription with ticket if I can be assisted on this kind of question ?  And if I buy a subscription, I will ask help to configure CEPH for the best too, SSD pool, normal speed pool, how to set redundancy, how to make snapshots, how to make backups and so on and so on… is it the kind of things you can help me with ?
> You need to first buy a subscription.

I already have a community subscription but what I was really asking is IF i buy a higher one is this the kind of question the support can give me answers.

> Good luck
> Eneko

Thank you again for taking the time to answer me :)
> -- 
> Zuzendari Teknikoa / Director Técnico
> Binovo IT Human Project, S.L.
> Telf. 943493611
>       943324914
> Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es <http://www.binovo.es/>_______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20160317/30615dfc/attachment.htm>

More information about the pve-user mailing list