[PVE-User] How to configure the best for CEPH
Eneko Lacunza
elacunza at binovo.es
Thu Mar 17 11:28:06 CET 2016
Hi,
El 17/03/16 a las 10:51, Jean-Laurent Ivars escribió:
> El 16/03/16 a las 20:39, Jean-Laurent Ivars escribió:
>>> I have a 2 host cluster setup with ZFS and replicated on each other
>>> with pvesync script among other things and my VMs are running on
>>> these hosts for now but I am impatient to be able to migrate on my
>>> new infrastructure. I decided to change my infrastructure because I
>>> really would like to take advantage of CEPH for replication,
>>> expanding abilities, live migration and even maybe high availability
>>> setup.
>>>
>>> After having read a lot of documentations/books/forums, I decided to
>>> go with CEPH storage which seem to be the way to go for me.
>>>
>>> My servers are hosted by OVH and from what I read, and with the
>>> budget I have, the best options with CEPH storage in mind seemed to
>>> be the following servers :
>>> https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H
>>>
>>> With the following storage options : No HW Raid, 2X300Go SSD and
>>> 2X2To HDD
>> About the SSD, what exact brand/model are they? I can't find this
>> info on OVH web.
>
> The models are INTEL SSDSC2BB30, you can find information here :
> https://www.ovh.com/fr/serveurs_dedies/avantages-disques-ssd.xml
> They are datacenter SSD and they have the Power Loss Imminent protection.
Ok, they should perform well for Ceph, I have one of those in a setup.
You should monitor their wear-out though, as they are rated only for 0.3
drive writes per day.
>>
>>>
>>> I know that it would be better to give CEPH the whole disks but I
>>> have to put my system somewhere… I was thinking that even if it’s
>>> not the best (i can’t afford more), these settings would work… So I
>>> have tried to give CEPH the OSDs with my SSD journal partition with
>>> the appropriate command but it didn’t seem to work and I assume it's
>>> because CEPH don’t want partitions but entire hard drive…
>>>
>>> root at pvegra1 ~ # pveceph createosd /dev/sdc -journal_dev /dev/sda4
>>> create OSD on /dev/sdc (xfs)
>>> using device '/dev/sda4' for journal
>>> Creating new GPT entries.
>>> GPT data structures destroyed! You may now partition the disk using
>>> fdisk or
>>> other utilities.
>>> Creating new GPT entries.
>>> The operation has completed successfully.
>>> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not
>>> the same device as the osd data
>>> WARNING:ceph-disk:Journal /dev/sda4 was not prepared with ceph-disk.
>>> Symlinking directly.
>>> Setting name!
>>> partNum is 0
>>> REALLY setting name!
>>> The operation has completed successfully.
>>> meta-data=/dev/sdc1 isize=2048 agcount=4,
>>> agsize=122094597 blks
>>> = sectsz=512 attr=2, projid32bit=1
>>> = crc=0 finobt=0
>>> data = bsize=4096 blocks=488378385, imaxpct=5
>>> = sunit=0 swidth=0 blks
>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>>> log =internal log bsize=4096 blocks=238466, version=2
>>> = sectsz=512 sunit=0 blks, lazy-count=1
>>> realtime =none extsz=4096 blocks=0, rtextents=0
>>> Warning: The kernel is still using the old partition table.
>>> The new table will be used at the next reboot.
>>> The operation has completed successfully.
>>>
>>> I saw the following threads :
>>> https://forum.proxmox.com/threads/ceph-server-feedback.17909/
>>> https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/
>>>
>>> But this kind of setting seem to suffer performance issue and It’s
>>> not officially supported and I am not feeling very well with that
>>> because at the moment, I only took community subscription from
>>> Proxmox but I want to be able to move on a different plan to get
>>> support from them if I need it and if I go this way, I’m afraid they
>>> will say me it’s a non supported configuration.
>>>
>
> So you aren’t « shocked » I want to use partitions instead of whole
> drives in my configuration ?
OSD Journals are always a partition. :-) That is what Proxmox does from
GUI; creates a new partition for the journal in the journal-dirve; if
you don't choose a journal drive, then it creates 2 partitions on the
OSD disk, one for journal and the other for data.
>
>>> OVH can provide USB keys so I could install the system on it and get
>>> my whole disks for CEPH, but I think it is not supported too.
>>> Moreover, I fear for performances and stability in the time with
>>> this solution.
>>>
>>> Maybe I could use one SSD for the system and journal partitions (but
>>> again it’s a mix not really supported) and the other SSD dedicated
>>> to CEPH… but with this solution I loose my system RAID protection…
>>> and a lot of SSD space...
>>>
>>> I’m a little bit confused about the best partitioning scheme and how
>>> to manage to obtain a stable, supported, which the less space lost
>>> and performant configuration.
>>>
>>> Should I continue with my partitioning scheme even if it’s not the
>>> best supported, it seem the most appropriate in my case or do I need
>>> to completing rethink my install ?
>>>
>>> Please can someone give me advice, I’m all yours :)
>>> Thanks a lot for anyone taking the time to read this mail and giving
>>> me good advices.
>> I suggest you only mirror swap and root partitions. Then use one SSD
>> for earch OSD's journal.
>>
>> So to fix your problems, please try the following:
>> - Remove all OSDs from Proxmox GUI (or CLI)
>> - Remove journal partitions
>> - Remove journal partition mirrors
>> - Now we have 2 partitions on each SSD (swap and root), mirrored.
>> - Create OSDs from Proxmox GUI, use a different SSD disk for journal
>> of each OSD. If you can't do this, SSD drives don't have GPT partition.
>
> Tank you very much for you suggestion, I am going to follow you
> advices, only changing one thing, as a french ml user told me, swap is
> not a really good idea, my system won’t really need it and if it does
> it would not be good for overall performances : it can cause intensive
> IO access so I should not add this in my setup witch is soliciting the
> SSD enough...
I have seen problems with too much swap, but 1-2 GB shouldn't be a
problem. In fact new Proxmox ISOs will limit swap to a max size of 8 o 4
GB (I don't recall right now).
>
>>>
>>> P.S. If someone from the official proxmox support team sees this
>>> message can you tell me If I buy a subscription with ticket if I can
>>> be assisted on this kind of question ? And if I buy a subscription,
>>> I will ask help to configure CEPH for the best too, SSD pool, normal
>>> speed pool, how to set redundancy, how to make snapshots, how to
>>> make backups and so on and so on… is it the kind of things you can
>>> help me with ?
>> You need to first buy a subscription.
>>
>
> I already have a community subscription but what I was really asking
> is IF i buy a higher one is this the kind of question the support can
> give me answers.
Maybe better write directly to Dietmar o Martin to ask about this :)
>
> Thank you again for taking the time to answer me :)
You're welcome!
Cheers
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20160317/56b2b1ad/attachment.htm>
More information about the pve-user
mailing list