[PVE-User] How to configure the best for CEPH
Eneko Lacunza
elacunza at binovo.es
Thu Mar 17 09:50:26 CET 2016
Hi Jean-Laurent,
El 16/03/16 a las 20:39, Jean-Laurent Ivars escribió:
> I have a 2 host cluster setup with ZFS and replicated on each other
> with pvesync script among other things and my VMs are running on these
> hosts for now but I am impatient to be able to migrate on my new
> infrastructure. I decided to change my infrastructure because I really
> would like to take advantage of CEPH for replication, expanding
> abilities, live migration and even maybe high availability setup.
>
> After having read a lot of documentations/books/forums, I decided to
> go with CEPH storage which seem to be the way to go for me.
>
> My servers are hosted by OVH and from what I read, and with the budget
> I have, the best options with CEPH storage in mind seemed to be the
> following servers :
> https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H
>
> With the following storage options : No HW Raid, 2X300Go SSD and 2X2To HDD
About the SSD, what exact brand/model are they? I can't find this info
on OVH web.
>
> One of the reasons I choose these models is the 10Gb VRACK option and
> I understood that CEPH needs a fast network to be efficient. Of course
> in a perfect world, the best would be to have a lot of disks for OSDs,
> two more SSD for my system and 2 10Gb bonded NIC but this is the most
> approaching I can afford in the OVH product range.
In your configuration, I doubt very much you'll be able to leverage 10Gb
NICs; I have a 3node 3osd each setup in our office, with 1 gbit network,
and ceph hardly uses 200-300Mbps. Maybe you have a bit lower latency,
but that will be all.
>
> I already made the install of the cluster and set different VLANs for
> cluster and storage. Set the hosts files and installed CEPH.
> Everything went seamless except the fact that OVH installation create
> a MBR install on the SSD and CEPH needs a GPT one but I managed to
> convert the partition tables so now, I though I was all set for CEPH
> configuration.
>
> _For now, my partitioning scheme is the following :__(_message
> rejected because too big for mailing list so there is a link)
> https://www.ipgenius.fr/tools/pveceph.png
Seems quite good, maybe having a bit more room for root filesystem would
be good, you have 300GB of disk... :) Also see below.
>
> I know that it would be better to give CEPH the whole disks but I have
> to put my system somewhere… I was thinking that even if it’s not the
> best (i can’t afford more), these settings would work… So I have tried
> to give CEPH the OSDs with my SSD journal partition with the
> appropriate command but it didn’t seem to work and I assume it's
> because CEPH don’t want partitions but entire hard drive…
>
> root at pvegra1 ~ # pveceph createosd /dev/sdc -journal_dev /dev/sda4
> create OSD on /dev/sdc (xfs)
> using device '/dev/sda4' for journal
> Creating new GPT entries.
> GPT data structures destroyed! You may now partition the disk using
> fdisk or
> other utilities.
> Creating new GPT entries.
> The operation has completed successfully.
> WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the
> same device as the osd data
> WARNING:ceph-disk:Journal /dev/sda4 was not prepared with ceph-disk.
> Symlinking directly.
> Setting name!
> partNum is 0
> REALLY setting name!
> The operation has completed successfully.
> meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=122094597 blks
> = sectsz=512 attr=2, projid32bit=1
> = crc=0 finobt=0
> data = bsize=4096 blocks=488378385, imaxpct=5
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> log =internal log bsize=4096 blocks=238466, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> Warning: The kernel is still using the old partition table.
> The new table will be used at the next reboot.
> The operation has completed successfully.
>
> I saw the following threads :
> https://forum.proxmox.com/threads/ceph-server-feedback.17909/
> https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/
>
> But this kind of setting seem to suffer performance issue and It’s not
> officially supported and I am not feeling very well with that because
> at the moment, I only took community subscription from Proxmox but I
> want to be able to move on a different plan to get support from them
> if I need it and if I go this way, I’m afraid they will say me it’s a
> non supported configuration.
>
> OVH can provide USB keys so I could install the system on it and get
> my whole disks for CEPH, but I think it is not supported too.
> Moreover, I fear for performances and stability in the time with this
> solution.
>
> Maybe I could use one SSD for the system and journal partitions (but
> again it’s a mix not really supported) and the other SSD dedicated to
> CEPH… but with this solution I loose my system RAID protection… and a
> lot of SSD space...
>
> I’m a little bit confused about the best partitioning scheme and how
> to manage to obtain a stable, supported, which the less space lost and
> performant configuration.
>
> Should I continue with my partitioning scheme even if it’s not the
> best supported, it seem the most appropriate in my case or do I need
> to completing rethink my install ?
>
> Please can someone give me advice, I’m all yours :)
> Thanks a lot for anyone taking the time to read this mail and giving
> me good advices.
I suggest you only mirror swap and root partitions. Then use one SSD for
earch OSD's journal.
So to fix your problems, please try the following:
- Remove all OSDs from Proxmox GUI (or CLI)
- Remove journal partitions
- Remove journal partition mirrors
- Now we have 2 partitions on each SSD (swap and root), mirrored.
- Create OSDs from Proxmox GUI, use a different SSD disk for journal of
each OSD. If you can't do this, SSD drives don't have GPT partition.
>
> P.S. If someone from the official proxmox support team sees this
> message can you tell me If I buy a subscription with ticket if I can
> be assisted on this kind of question ? And if I buy a subscription, I
> will ask help to configure CEPH for the best too, SSD pool, normal
> speed pool, how to set redundancy, how to make snapshots, how to make
> backups and so on and so on… is it the kind of things you can help me
> with ?
You need to first buy a subscription.
Good luck
Eneko
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.proxmox.com/pipermail/pve-user/attachments/20160317/8fb12028/attachment.htm>
More information about the pve-user
mailing list