[PVE-User] Ceph Journal Performance

Eneko Lacunza elacunza at binovo.es
Wed Nov 5 17:34:04 CET 2014

Hi Lindsay,

On 05/11/14 01:52, Lindsay Mathieson wrote:
> Thanks for the informative reply Eneko, most helpful. 
I'm glad that my response was helpful, thanks :)
>> 4 drives per server will be better, but using SSD for journals will help you
>> a lot, could even give you better performance than 4 osds per server. He had
>> for some months a 2-osd setup with journal on intel ssd 320's and about 20
>> VMs working quite good. (didn't test performance)
> I finally got round to testing ceph with ssd journal. Took me a bit as
> I had to use a gparted boot iso to repartition the os ssd to free up
> space, as ceph doesn't seem to like lvs partitions for journals.
> I had to create the osd from the command line (pvecep hcreateosd) as
> the webui didn't list my ssd partitions.
> It did make a huge difference, raw vm IO increased from 3MB/s to 40.
> Multiple VM's were much more responsive, quite usable.
> Overall, I seemed to get similar i/o to what I was getting with
> gluster, when I implemented a SSD cache for it (EXT4 with SSD
> Journal). However ceph seemed to cope better with high loads, with one
> of my stress tests - starting 7 vm's simultaneously, gluster seemed to
> fail, with some of the VM's reporting I/O errors and crashing.
> Whereas with ceph, they were very slow :) but all started normally.
> Good enough results, that I think I will get a dedicated journal SSD
> and add a couple of extra disks, though I have to work on our network
> link. Its 2 bonded 1GB ports, but its maxing out at 90M/s, should do
> better. Probably because I'm only using balance-rr, we have a Managed
> switch with LACP, but I have to move it :) Need to replug everything
> ...
When choosing the new SSD look at random write IOPS, at 100% span, 
that's the important performance factor for Ceph.

Thanks for sharing, I haven't used glusterfs but knowing about those I/O 
errors is interesting.
>> Proxmox/ceph will create a separate partition (5GB default) for each OSD's
>> journal. Check your SSD's write IOPS too.
> Can journal size be too large? if I gave 20GB+ to a journal for 3TB
> drives would it be used or is that just a waste?
Dmitry seems to have more experience with this, check his reply; I 
haven't done such tuning yet. I guess having more journal size should 
help to not hit the journal limit when there's lots of I/O happening at 
the same time and it's too fast to write back to the osd itself :)


Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943575997
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)

More information about the pve-user mailing list