[PVE-User] How to configure the best for CEPH

Jean-Laurent Ivars jl.ivars at ipgenius.fr
Thu Mar 17 11:40:31 CET 2016

thank you again for your advices and recommendations

have nice day :)

Jean-Laurent Ivars 
Responsable Technique | Technical Manager
22, rue Robert - 13007 Marseille 
Tel: 09 84 56 64 30 - Mobile: 
Linkedin <http://fr.linkedin.com/in/jlivars/>   |  Viadeo <http://www.viadeo.com/fr/profile/jean-laurent.ivars>   |  www.ipgenius.fr <https://www.ipgenius.fr/>
> Le 17 mars 2016 à 11:28, Eneko Lacunza <elacunza at binovo.es> a écrit :
> Hi,
> El 17/03/16 a las 10:51, Jean-Laurent Ivars escribió:
>> El 16/03/16 a las 20:39, Jean-Laurent Ivars escribió:
>>>> I have a 2 host cluster setup with ZFS and replicated on each other with pvesync script among other things and my VMs are running on these hosts for now but I am impatient to be able to migrate on my new infrastructure. I decided to change my infrastructure because I really would like to take advantage of CEPH for replication, expanding abilities, live migration and even maybe high availability setup.
>>>> After having read a lot of documentations/books/forums, I decided to go with CEPH storage which seem to be the way to go for me.
>>>> My servers are hosted by OVH and from what I read, and with the budget I have, the best options with CEPH storage in mind seemed to be the following servers : https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H <https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H> 
>>>> With the following storage options : No HW Raid, 2X300Go SSD and 2X2To HDD
>>> About the SSD, what exact brand/model are they? I can't find this info on OVH web.
>> The models are INTEL SSDSC2BB30, you can find information here : https://www.ovh.com/fr/serveurs_dedies/avantages-disques-ssd.xml <https://www.ovh.com/fr/serveurs_dedies/avantages-disques-ssd.xml>
>> They are datacenter SSD and they have the Power Loss Imminent protection.
> Ok, they should perform well for Ceph, I have one of those in a setup. You should monitor their wear-out though, as they are rated only for 0.3 drive writes per day.
>>>> I know that it would be better to give CEPH the whole disks but I have to put my system somewhere… I was thinking that even if it’s not the best (i can’t afford more), these settings would work… So I have tried to give CEPH the OSDs with my SSD journal partition with the appropriate command but it didn’t seem to work and I assume it's because CEPH don’t want partitions but entire hard drive…
>>>> root at pvegra1 ~ # pveceph createosd /dev/sdc -journal_dev /dev/sda4
>>>> create OSD on /dev/sdc (xfs)
>>>> using device '/dev/sda4' for journal
>>>> Creating new GPT entries.
>>>> GPT data structures destroyed! You may now partition the disk using fdisk or
>>>> other utilities.
>>>> Creating new GPT entries.
>>>> The operation has completed successfully.
>>>> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
>>>> WARNING:ceph-disk:Journal /dev/sda4 was not prepared with ceph-disk. Symlinking directly.
>>>> Setting name!
>>>> partNum is 0
>>>> REALLY setting name!
>>>> The operation has completed successfully.
>>>> meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=122094597 blks
>>>>          =                       sectsz=512   attr=2, projid32bit=1
>>>>          =                       crc=0        finobt=0
>>>> data     =                       bsize=4096   blocks=488378385, imaxpct=5
>>>>          =                       sunit=0      swidth=0 blks
>>>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>>>> log      =internal log           bsize=4096   blocks=238466, version=2
>>>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>>> Warning: The kernel is still using the old partition table.
>>>> The new table will be used at the next reboot.
>>>> The operation has completed successfully.
>>>> I saw the following threads : 
>>>> https://forum.proxmox.com/threads/ceph-server-feedback.17909/ <https://forum.proxmox.com/threads/ceph-server-feedback.17909/>  
>>>> https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/ <https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/>
>>>> But this kind of setting seem to suffer performance issue and It’s not officially supported and I am not feeling very well with that because at the moment, I only took community subscription from Proxmox but I want to be able to move on a different plan to get support from them if I need it and if I go this way, I’m afraid they will say me it’s a non supported configuration.
>> So you aren’t « shocked » I want to use partitions instead of whole drives in my configuration ?
> OSD Journals are always a partition. :-) That is what Proxmox does from GUI; creates a new partition for the journal in the journal-dirve; if you don't choose a journal drive, then it creates 2 partitions on the OSD disk, one for journal and the other for data.
>>>> OVH can provide USB keys so I could install the system on it and get my whole disks for CEPH, but I think it is not supported too. Moreover, I fear for performances and stability in the time with this solution.
>>>> Maybe I could use one SSD for the system and journal partitions (but again it’s a mix not really supported) and the other SSD dedicated to CEPH… but with this solution I loose my system RAID protection… and a lot of SSD space...
>>>> I’m a little bit confused about the best partitioning scheme and how to manage to obtain a stable, supported, which the less space lost and performant configuration.
>>>> Should I continue with my partitioning scheme even if it’s not the best supported, it seem the most appropriate in my case or do I need to completing rethink my install ?
>>>> Please can someone give me advice, I’m all yours :)
>>>> Thanks a lot for anyone taking the time to read this mail and giving me good advices.
>>> I suggest you only mirror swap and root partitions. Then use one SSD for earch OSD's journal.
>>> So to fix your problems, please try the following:
>>> - Remove all OSDs from Proxmox GUI (or CLI)
>>> - Remove journal partitions
>>> - Remove journal partition mirrors
>>> - Now we have 2 partitions on each SSD (swap and root), mirrored.
>>> - Create OSDs from Proxmox GUI, use a different SSD disk for journal of each OSD. If you can't do this, SSD drives don't have GPT partition.
>> Tank you very much for you suggestion, I am going to follow you advices, only changing one thing, as a french ml user told me, swap is not a really good idea, my system won’t really need it and if it does it would not be good for overall performances : it can cause intensive IO access so I should not add this in my setup witch is soliciting the SSD enough...
> I have seen problems with too much swap, but 1-2 GB shouldn't be a problem. In fact new Proxmox ISOs will limit swap to a max size of 8 o 4 GB (I don't recall right now).
>>>> P.S. If someone from the official proxmox support team sees this message can you tell me If I buy a subscription with ticket if I can be assisted on this kind of question ?  And if I buy a subscription, I will ask help to configure CEPH for the best too, SSD pool, normal speed pool, how to set redundancy, how to make snapshots, how to make backups and so on and so on… is it the kind of things you can help me with ?
>>> You need to first buy a subscription.
>> I already have a community subscription but what I was really asking is IF i buy a higher one is this the kind of question the support can give me answers.
> Maybe better write directly to Dietmar o Martin to ask about this :)
>> Thank you again for taking the time to answer me :)
> You're welcome!
> Cheers
> Eneko
> -- 
> Zuzendari Teknikoa / Director Técnico
> Binovo IT Human Project, S.L.
> Telf. 943493611
>       943324914
> Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
> www.binovo.es <http://www.binovo.es/>_______________________________________________
> pve-user mailing list
> pve-user at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20160317/0fe14025/attachment.htm>

More information about the pve-user mailing list