<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi Jean-Laurent,<br>
<br>
El 16/03/16 a las 20:39, Jean-Laurent Ivars escribió:<br>
</div>
<blockquote
cite="mid:3BAE8E69-5D5F-4A99-9C74-C89E0E171EA4@ipgenius.fr"
type="cite">
<div class="">I have a 2 host cluster setup with ZFS and
replicated on each other with pvesync script among other things
and my VMs are running on these hosts for now but I am impatient
to be able to migrate on my new infrastructure. I decided to
change my infrastructure because I really would like to take
advantage of CEPH for replication, expanding abilities, live
migration and even maybe high availability setup.
<div class=""><br class="">
</div>
<div class="">After having read a lot of
documentations/books/forums, I decided to go with CEPH storage
which seem to be the way to go for me.</div>
<div class=""><br class="">
</div>
<div class="">My servers are hosted by OVH and from what I read,
and with the budget I have, the best options with CEPH storage
in mind seemed to be the following servers : <a
moz-do-not-send="true"
href="https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H"
class=""><a class="moz-txt-link-freetext" href="https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H">https://www.ovh.com/fr/serveurs_dedies/details-servers.xml?range=HOST&id=2016-HOST-32H</a></a> </div>
<div class="">With the following storage options : No HW Raid,
2X300Go SSD and 2X2To HDD</div>
</div>
</blockquote>
About the SSD, what exact brand/model are they? I can't find this
info on OVH web.<br>
<blockquote
cite="mid:3BAE8E69-5D5F-4A99-9C74-C89E0E171EA4@ipgenius.fr"
type="cite">
<div class=""><br class="">
</div>
<div class="">One of the reasons I choose these models is the 10Gb
VRACK option and I understood that CEPH needs a fast network to
be efficient. Of course in a perfect world, the best would be to
have a lot of disks for OSDs, two more SSD for my system and 2
10Gb bonded NIC but this is the most approaching I can afford in
the OVH product range.</div>
</blockquote>
In your configuration, I doubt very much you'll be able to leverage
10Gb NICs; I have a 3node 3osd each setup in our office, with 1 gbit
network, and ceph hardly uses 200-300Mbps. Maybe you have a bit
lower latency, but that will be all.<br>
<blockquote
cite="mid:3BAE8E69-5D5F-4A99-9C74-C89E0E171EA4@ipgenius.fr"
type="cite">
<div class=""><br class="">
</div>
<div class="">I already made the install of the cluster and set
different VLANs for cluster and storage. Set the hosts files and
installed CEPH. Everything went seamless except the fact that
OVH installation create a MBR install on the SSD and CEPH needs
a GPT one but I managed to convert the partition tables so now,
I though I was all set for CEPH configuration.</div>
<div class=""><br class="">
</div>
<div class=""><u class="">For now, my partitioning scheme is the
following :</u><span class=""> </span><u class="">(</u>message
rejected because too big for mailing list so there is a link) <a
moz-do-not-send="true"
href="https://www.ipgenius.fr/tools/pveceph.png" class=""><a class="moz-txt-link-freetext" href="https://www.ipgenius.fr/tools/pveceph.png">https://www.ipgenius.fr/tools/pveceph.png</a></a></div>
</blockquote>
<br>
Seems quite good, maybe having a bit more room for root filesystem
would be good, you have 300GB of disk... :) Also see below.<br>
<br>
<blockquote
cite="mid:3BAE8E69-5D5F-4A99-9C74-C89E0E171EA4@ipgenius.fr"
type="cite">
<div class=""><br class="">
</div>
<div class="">I know that it would be better to give CEPH the
whole disks but I have to put my system somewhere… I was
thinking that even if it’s not the best (i can’t afford more),
these settings would work… So I have tried to give CEPH the OSDs
with my SSD journal partition with the appropriate command but
it didn’t seem to work and I assume it's because CEPH don’t want
partitions but entire hard drive…</div>
<div class=""><br class="">
</div>
<div class="">
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);"><span class="" style="color:
rgb(195, 55, 32);">root</span><span class="" style="color:
rgb(175, 173, 36);">@</span><span class="" style="color:
rgb(52, 187, 199);">pvegra1 </span><span class=""
style="color: rgb(175, 173, 36);">~ </span><span class=""
style="color: rgb(213, 59, 211);"># </span>pveceph createosd
/dev/sdc -journal_dev /dev/sda4</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">create OSD on /dev/sdc (xfs)</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">using device '/dev/sda4' for
journal</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">Creating new GPT entries.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">GPT data structures
destroyed! You may now partition the disk using fdisk or</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">other utilities.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">Creating new GPT entries.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">The operation has completed
successfully.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">WARNING:ceph-disk:OSD will
not be hot-swappable if journal is not the same device as the
osd data</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">WARNING:ceph-disk:Journal
/dev/sda4 was not prepared with ceph-disk. Symlinking
directly.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">Setting name!</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">partNum is 0</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">REALLY setting name!</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">The operation has completed
successfully.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">meta-data=/dev/sdc1
isize=2048 agcount=4, agsize=122094597 blks</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);"> =
sectsz=512 attr=2, projid32bit=1</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);"> =
crc=0 finobt=0</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">data =
bsize=4096 blocks=488378385, imaxpct=5</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);"> =
sunit=0 swidth=0 blks</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">naming =version 2
bsize=4096 ascii-ci=0 ftype=0</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">log =internal log
bsize=4096 blocks=238466, version=2</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);"> =
sectsz=512 sunit=0 blks, lazy-count=1</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">realtime =none
extsz=4096 blocks=0, rtextents=0</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">Warning: The kernel is still
using the old partition table.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">The new table will be used at
the next reboot.</div>
<div class="" style="margin: 0px; line-height: normal;
font-family: 'Andale Mono'; color: rgb(41, 249, 20);
background-color: rgb(0, 0, 0);">The operation has completed
successfully.</div>
</div>
<div class=""><br class="">
</div>
<div class="">I saw the following threads : </div>
<div class=""><a moz-do-not-send="true"
href="https://forum.proxmox.com/threads/ceph-server-feedback.17909/"
class="">https://forum.proxmox.com/threads/ceph-server-feedback.17909/</a> </div>
<div class=""><a moz-do-not-send="true"
href="https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/"
class="">https://forum.proxmox.com/threads/ceph-server-why-block-devices-and-not-partitions.17863/</a><br
class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">But this kind of setting seem to suffer
performance issue and It’s not officially supported and I am
not feeling very well with that because at the moment, I only
took community subscription from Proxmox but I want to be able
to move on a different plan to get support from them if I need
it and if I go this way, I’m afraid they will say me it’s a
non supported configuration.</div>
<div class=""><br class="">
</div>
<div class="">OVH can provide USB keys so I could install the
system on it and get my whole disks for CEPH, but I think it
is not supported too. Moreover, I fear for performances and
stability in the time with this solution.</div>
<div class=""><br class="">
</div>
<div class="">Maybe I could use one SSD for the system and
journal partitions (but again it’s a mix not really supported)
and the other SSD dedicated to CEPH… but with this solution I
loose my system RAID protection… and a lot of SSD space...</div>
<div class=""><br class="">
</div>
<div class="">I’m a little bit confused about the best
partitioning scheme and how to manage to obtain a stable,
supported, which the less space lost and performant
configuration.</div>
<div class=""><br class="">
</div>
<div class="">Should I continue with my partitioning scheme even
if it’s not the best supported, it seem the most appropriate
in my case or do I need to completing rethink my install ?</div>
<div class=""><br class="">
</div>
<div class="">Please can someone give me advice, I’m all yours
:)</div>
<div class="">Thanks a lot for anyone taking the time to read
this mail and giving me good advices.</div>
</div>
</blockquote>
I suggest you only mirror swap and root partitions. Then use one SSD
for earch OSD's journal.<br>
<br>
So to fix your problems, please try the following:<br>
- Remove all OSDs from Proxmox GUI (or CLI)<br>
- Remove journal partitions<br>
- Remove journal partition mirrors<br>
- Now we have 2 partitions on each SSD (swap and root), mirrored.<br>
- Create OSDs from Proxmox GUI, use a different SSD disk for journal
of each OSD. If you can't do this, SSD drives don't have GPT
partition.<br>
<blockquote
cite="mid:3BAE8E69-5D5F-4A99-9C74-C89E0E171EA4@ipgenius.fr"
type="cite">
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">P.S. If someone from the official proxmox support
team sees this message can you tell me If I buy a subscription
with ticket if I can be assisted on this kind of question ?
And if I buy a subscription, I will ask help to configure
CEPH for the best too, SSD pool, normal speed pool, how to set
redundancy, how to make snapshots, how to make backups and so
on and so on… is it the kind of things you can help me with ?</div>
</div>
</blockquote>
You need to first buy a subscription.<br>
<br>
Good luck<br>
Eneko<br>
<br>
<pre class="moz-signature" cols="72">--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
<a class="moz-txt-link-abbreviated" href="http://www.binovo.es">www.binovo.es</a></pre>
</body>
</html>