[PVE-User] How to configure the best for CEPH

Eneko Lacunza elacunza at binovo.es
Thu Mar 17 11:28:06 CET 2016


Hi,

El 17/03/16 a las 10:51, Jean-Laurent Ivars escribió:
> El 16/03/16 a las 20:39, Jean-Laurent Ivars escribió:
>>> I have a 2 host cluster setup with ZFS and replicated on each other 
>>> with pvesync script among other things and my VMs are running on 
>>> these hosts for now but I am impatient to be able to migrate on my 
>>> new infrastructure. I decided to change my infrastructure because I 
>>> really would like to take advantage of CEPH for replication, 
>>> expanding abilities, live migration and even maybe high availability 
>>> setup.
>>>
>>> After having read a lot of documentations/books/forums, I decided to 
>>> go with CEPH storage which seem to be the way to go for me.
>>>
>>> My servers are hosted by OVH and from what I read, and with the 
>>> budget I have, the best options with CEPH storage in mind seemed to 
>>> be the following servers : 
>>> https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H 
>>>
>>> With the following storage options : No HW Raid, 2X300Go SSD and 
>>> 2X2To HDD
>> About the SSD, what exact brand/model are they? I can't find this 
>> info on OVH web.
>
> The models are INTEL SSDSC2BB30, you can find information here : 
> https://www.ovh.com/fr/serveurs_dedies/avantages-disques-ssd.xml
> They are datacenter SSD and they have the Power Loss Imminent protection.
Ok, they should perform well for Ceph, I have one of those in a setup. 
You should monitor their wear-out though, as they are rated only for 0.3 
drive writes per day.
>>
>>>
>>> I know that it would be better to give CEPH the whole disks but I 
>>> have to put my system somewhere… I was thinking that even if it’s 
>>> not the best (i can’t afford more), these settings would work… So I 
>>> have tried to give CEPH the OSDs with my SSD journal partition with 
>>> the appropriate command but it didn’t seem to work and I assume it's 
>>> because CEPH don’t want partitions but entire hard drive…
>>>
>>> root at pvegra1 ~ # pveceph createosd /dev/sdc -journal_dev /dev/sda4
>>> create OSD on /dev/sdc (xfs)
>>> using device '/dev/sda4' for journal
>>> Creating new GPT entries.
>>> GPT data structures destroyed! You may now partition the disk using 
>>> fdisk or
>>> other utilities.
>>> Creating new GPT entries.
>>> The operation has completed successfully.
>>> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not 
>>> the same device as the osd data
>>> WARNING:ceph-disk:Journal /dev/sda4 was not prepared with ceph-disk. 
>>> Symlinking directly.
>>> Setting name!
>>> partNum is 0
>>> REALLY setting name!
>>> The operation has completed successfully.
>>> meta-data=/dev/sdc1             isize=2048   agcount=4, 
>>> agsize=122094597 blks
>>>   =                       sectsz=512   attr=2, projid32bit=1
>>>   =                       crc=0        finobt=0
>>> data   =                       bsize=4096 blocks=488378385, imaxpct=5
>>>   =                       sunit=0      swidth=0 blks
>>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
>>> log   =internal log           bsize=4096 blocks=238466, version=2
>>>   =                       sectsz=512   sunit=0 blks, lazy-count=1
>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>> Warning: The kernel is still using the old partition table.
>>> The new table will be used at the next reboot.
>>> The operation has completed successfully.
>>>
>>> I saw the following threads :
>>> https://forum.proxmox.com/threads/ceph-server-feedback.17909/
>>> https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/
>>>
>>> But this kind of setting seem to suffer performance issue and It’s 
>>> not officially supported and I am not feeling very well with that 
>>> because at the moment, I only took community subscription from 
>>> Proxmox but I want to be able to move on a different plan to get 
>>> support from them if I need it and if I go this way, I’m afraid they 
>>> will say me it’s a non supported configuration.
>>>
>
> So you aren’t « shocked » I want to use partitions instead of whole 
> drives in my configuration ?
OSD Journals are always a partition. :-) That is what Proxmox does from 
GUI; creates a new partition for the journal in the journal-dirve; if 
you don't choose a journal drive, then it creates 2 partitions on the 
OSD disk, one for journal and the other for data.
>
>>> OVH can provide USB keys so I could install the system on it and get 
>>> my whole disks for CEPH, but I think it is not supported too. 
>>> Moreover, I fear for performances and stability in the time with 
>>> this solution.
>>>
>>> Maybe I could use one SSD for the system and journal partitions (but 
>>> again it’s a mix not really supported) and the other SSD dedicated 
>>> to CEPH… but with this solution I loose my system RAID protection… 
>>> and a lot of SSD space...
>>>
>>> I’m a little bit confused about the best partitioning scheme and how 
>>> to manage to obtain a stable, supported, which the less space lost 
>>> and performant configuration.
>>>
>>> Should I continue with my partitioning scheme even if it’s not the 
>>> best supported, it seem the most appropriate in my case or do I need 
>>> to completing rethink my install ?
>>>
>>> Please can someone give me advice, I’m all yours :)
>>> Thanks a lot for anyone taking the time to read this mail and giving 
>>> me good advices.
>> I suggest you only mirror swap and root partitions. Then use one SSD 
>> for earch OSD's journal.
>>
>> So to fix your problems, please try the following:
>> - Remove all OSDs from Proxmox GUI (or CLI)
>> - Remove journal partitions
>> - Remove journal partition mirrors
>> - Now we have 2 partitions on each SSD (swap and root), mirrored.
>> - Create OSDs from Proxmox GUI, use a different SSD disk for journal 
>> of each OSD. If you can't do this, SSD drives don't have GPT partition.
>
> Tank you very much for you suggestion, I am going to follow you 
> advices, only changing one thing, as a french ml user told me, swap is 
> not a really good idea, my system won’t really need it and if it does 
> it would not be good for overall performances : it can cause intensive 
> IO access so I should not add this in my setup witch is soliciting the 
> SSD enough...
I have seen problems with too much swap, but 1-2 GB shouldn't be a 
problem. In fact new Proxmox ISOs will limit swap to a max size of 8 o 4 
GB (I don't recall right now).
>
>>>
>>> P.S. If someone from the official proxmox support team sees this 
>>> message can you tell me If I buy a subscription with ticket if I can 
>>> be assisted on this kind of question ?  And if I buy a subscription, 
>>> I will ask help to configure CEPH for the best too, SSD pool, normal 
>>> speed pool, how to set redundancy, how to make snapshots, how to 
>>> make backups and so on and so on… is it the kind of things you can 
>>> help me with ?
>> You need to first buy a subscription.
>>
>
> I already have a community subscription but what I was really asking 
> is IF i buy a higher one is this the kind of question the support can 
> give me answers.
Maybe better write directly to Dietmar o Martin to ask about this :)
>
> Thank you again for taking the time to answer me :)
You're welcome!

Cheers
Eneko

-- 
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
       943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://pve.proxmox.com/pipermail/pve-user/attachments/20160317/56b2b1ad/attachment-0015.html>


More information about the pve-user mailing list