[PVE-User] Best configuration among these : Episode 2

Philippe Schwarz phil at schwarz-fr.net
Tue Feb 24 13:53:15 CET 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, thanks for your answers
Le 24/02/2015 11:21, Eneko Lacunza a écrit :
> Hi,
> 
> On 24/02/15 10:39, Philippe Schwarz wrote:
>> So, i modified my setup and have a few questions. No more SAN (no
>> more SPOF), but an increase in hardware specs of the 3 remaining
>> servers.
>> 
>> 
>> About storage network :
>> 
>> I can't afford loosing the cluster because of the network
>> failing. I'm on the way to use Netgear XS708E, a 10Gbe switch.
>> Should be doubled to be redundant. But, i should therefore double
>> the 10Gbe NIC too. Too expensive solution! So i planned to setup
>> an active/passive bond : 1X10gbe (active) + 1X1Gbe(passive), the
>> 10gbe plugged on the XS708E, the 1Gbe plugged on cheap 1gbe
>> switch and both switchs connected with a single (or dual, LACP)
>> link.
>> 
>> The other (because of Intel Dual port 10Gb) NIC 10 Gbe will be 
>> connected to the LAN using the same principle
>> (active10Gbe+passive 1gbe bond).
>> 
>> Is there an issue with that ?
> I never did something like this myself. I think you should be OK
> with 1gbit network for the number of OSDs you're listing. The only
> exception could be the 5xSSD setup, but you'll need lot's of CPU
> power to make use of all IOPS power in that setup.
> 
OK, but i think i'll go away from the expensive 5xSSD solution.

> I have 3 small proxmox-ceph clusters and neither of them surpasses 
> 250Mbps on peak use (and it's for backups), and there are some
> DB-heavy processes running every two hours. Normally with so small
> number of OSD you'll be limited by the IOPS of magnetic hard
> disks.
OK, so a single 10Gbe is far from nowadays limit for us. Good for the
future.
Gonna give a try to the poor-man failover solution.

>> About Ceph journal : Because of the smart use of fast device for
>> the journal and slow/cheap/large device for the datas, i wonder
>> which solution should be the best: 1.
>> 1SSD-200G(ProxMox)+1SSDPCIE-400G:IntelP3700:(journal)+4SATA
>> 1TB=2200€ 2. 1SSD-200G(ProxMox)+1SSD-200G(journal)+4SATA 1TB=
>> 900€ 3. 1SSD-200G(ProxMox)+5SSD 1TB, no journal= 2600€ Not the
>> same price, but not the same perfs either... Non PCIe SSD would
>> be either Intel S3700 200GB or Samsung 850 Pro 1TB Any clue ?
> Samsung 840 Pro is total shit for ceph, so I won't event test the
> 850. The ceph community consensus seems to be Intel S3700 200GB so
> I won't look elsewhere.
Precious information, indeed!

> 
> Currently Ceph (firefly) has some performance bottlenecks with SSD 
> drives and can't use all their performance, so I don't think going
> PCIe SSD for journals will help you, unless you use the same disk
> for more OSD journals. I'd choose option #2, you could even put
> proxmox on the same SSD as the journals. You can use the excess of
> money for more OSD disks, generally people use 1 SSD per 3-4 OSDs.
> You can also put the journals of 2 OSD in one SSD and the other 2
> journals and proxmox on the other.
Interesting Setup. It doubles the IOPS on the journal but divides by
two the MTBF, because the failure of the journal is the failure of the
server. Incidentally, it divides by two the number of writes on the
SSD and doubles the average lifetime of the SSD. Interesting !

> 
>> About raid controller : Raid is mandatory on this controller, but
>> Jbod (or single disk array) mode will be used. Dell H730 is sold
>> with 1GB Non-volatile Cache and 2GB NVcache. Is the difference of
>> 300€ (the price of a good SSD) worth it ???
> I wouldn't invest in doubling the NV cache. Better put more OSDs
> and SSDs :)
That was my opinion too, good.
> 
>> Other  hardware considerations : My proxmox cluster will be made
>> of 1 samba + 1WS(Trend) + 1WS(WSUS) + 1WS(autocad licenses) + 1
>> squid + 1 LTSP + many other little other servers (apt-proxy,
>> xibo,...) So, except for the Squid+Squidguard, nothing really
>> CPU/RAM/IOPS hungry.. Before Ceph (previous idea was using a
>> ZFS/FreeBSD SAN) i planned to use 64 GB of RAM and Dual 2630
>> CPU. Should i go up to 96 GB and Dual 2650 CPU ?? (not sure i can
>> afford both)
> Calculate 1 GB for each OSD, 1GB for ceph monitors, then some for 
> proxmox and cache, maybe 4GB or so. If the remaining RAM is enough
> for your VMs to fit in 2 servers, then you're OK. If that is a Xeon
> E5-2650 I think you're OK with 1. 1Ghz for each OSD and 1 core for
> metadata. This will leave for your VMs 4-5 cores with 1 CPU on each
> server.
And a free socket in case of need.

>> About proxmox only : Is possible to setup a fourth proxmox only
>> as ceph server and not join it to the previous proxmox cluster
>> but join it to the ceph cluster (i've to find the real term)? I
>> don't see any issue with that.
> It is possible but you would lose the integration of the ceph 
> administration in that fourth node. It would be a bit strange, I 
> wouldn't do it. You can join to the proxmox cluster and not run any
> VMs in that proxmox node.
Yes, you're right. It's weird. But it was difficult to get the money
for 3nodes proxmox license..
"Eh, you, the opensource guy, i thought it was all free your solution..."
Heard it a few times recently ;-)
Hope i'll get the money for a fourth node. Gonna see it later.

>> Last one : Should i reduce the costs for those 3 servers to be
>> able to buy a fourth one (next year) to increase my ceph osds
>> numbers (won't be a proxmox server) or not ? I didn't find find
>> benchmarks on how the perfs increase with the number of servers
>> (and so OSDs)
> As said before, I would extend the proxmox/ceph cluster so that it
> is consistent across servers. You plan to use Proxmox Ceph Server 
> integration right?
Of course.

> Generally speaking, more OSDs = better performance. Take into
> account that Ceph is optimized for multiple user/VMs access to
> storage, not for 1 user/VM fast access. You can also add more
> OSDs/SSDs to existing servers.
Yes, the focus was put on this point in the tests i have read. It's
important not to test from a single client.
Ceph's purpose is to share the load onto many clients.
Testing (and designing it) with a single one is non-sense.
The more OSDs (the disks), the better.
Those cheap 7200rpm 1TB SATA are good enough for this purpose i think,
isnt'it ?
The most difficult part will be to get caddies..

> Cheers Eneko

Thanks for all your answers, you're a brilliant Ceph evangelist ;-)
You probably made a new fan !

Best regards.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlTsdDsACgkQlhqCFkbqHRY8MACgtoldRqD5tkTo4ykM6E/0ike7
x5EAnjEqujuC597zWjaQVGEsvxlnYz4N
=g7Y3
-----END PGP SIGNATURE-----



More information about the pve-user mailing list