[PVE-User] Ceph Journal Performance

Lindsay Mathieson lindsay.mathieson at gmail.com
Wed Nov 5 01:52:15 CET 2014

On 3 November 2014 18:10, Eneko Lacunza <elacunza at binovo.es> wrote:
> Hi Lindsay,

Thanks for the informative reply Eneko, most helpful.

> 4 drives per server will be better, but using SSD for journals will help you
> a lot, could even give you better performance than 4 osds per server. He had
> for some months a 2-osd setup with journal on intel ssd 320's and about 20
> VMs working quite good. (didn't test performance)

I finally got round to testing ceph with ssd journal. Took me a bit as
I had to use a gparted boot iso to repartition the os ssd to free up
space, as ceph doesn't seem to like lvs partitions for journals.

I had to create the osd from the command line (pvecep hcreateosd) as
the webui didn't list my ssd partitions.

It did make a huge difference, raw vm IO increased from 3MB/s to 40.
Multiple VM's were much more responsive, quite usable.

Overall, I seemed to get similar i/o to what I was getting with
gluster, when I implemented a SSD cache for it (EXT4 with SSD
Journal). However ceph seemed to cope better with high loads, with one
of my stress tests - starting 7 vm's simultaneously, gluster seemed to
fail, with some of the VM's reporting I/O errors and crashing.

Whereas with ceph, they were very slow :) but all started normally.

Good enough results, that I think I will get a dedicated journal SSD
and add a couple of extra disks, though I have to work on our network
link. Its 2 bonded 1GB ports, but its maxing out at 90M/s, should do
better. Probably because I'm only using balance-rr, we have a Managed
switch with LACP, but I have to move it :) Need to replug everything

> Take into account that usually you won't see sequential IO, but almost all
> will be random, due to IO from different VMs mixing in.

Thats definitely where the SSD has helped. VM's are much more responsive now.

>>> For 2 drives maybe better use DRBD.

Yah, looked at that - not flexible enough as we would want to expand
and way to fiddly to setup.

> Proxmox/ceph will create a separate partition (5GB default) for each OSD's
> journal. Check your SSD's write IOPS too.

Can journal size be too large? if I gave 20GB+ to a journal for 3TB
drives would it be used or is that just a waste?



More information about the pve-user mailing list