[PVE-User] Poor CEPH performance? or normal?
ronny+pve-user at aasen.cx
ronny+pve-user at aasen.cx
Fri Jul 27 14:05:53 CEST 2018
rbd striping is a per image setting. you may need to make the rbd image
and migrate data.
On 07/26/18 12:25, Mark Adams wrote:
> Hi Ronny,
> Thanks for your suggestions. Do you know if it is possible to change an
> existing rbd pool to striping? or does this have to be done on first setup?
> On Wed, 25 Jul 2018, 19:20 Ronny Aasen, <ronny+pve-user at aasen.cx> wrote:
>> On 25. juli 2018 02:19, Mark Adams wrote:
>>> Hi All,
>>> I have a proxmox 5.1 + ceph cluster of 3 nodes, each with 12 x WD 10TB
>>> drives. Network is 10Gbps on X550-T2, separate network for the ceph
>>> I have 1 VM currently running on this cluster, which is debian stretch
>>> a zpool on it. I'm zfs sending in to it, but only getting around ~15MiB/s
>>> write speed. does this sound right? it seems very slow to me.
>>> Not only that, but when this zfs send is running - I can not do any
>>> parallel sends to any other zfs datasets inside of the same VM. They just
>>> seem to hang, then eventually say "dataset is busy".
>>> Any pointers or insights greatly appreciated!
>> alwin gave you some good advice about filesystems and vm's, i wanted to
>> say a little about ceph.
>> with 3 nodes, and the default and reccomended size=3 pools, you can not
>> tolerate any node failures. IOW, if you loose a node, or need to do
>> lengthy maintainance on it, you are running degraded. I allways have a
>> 4th "failure domain" node. so my cluster can selfheal (one of cephs
>> killer features) from a node failure. your cluster should be
>> spinning osd's with bluestore benefit greatly from ssd DB/WAL's if your
>> osd's have ondisk DB/WAL you can gain a lot of performance by having the
>> DB/WAL on a SSD or better.
>> ceph gains performance with scale(number of osd nodes) . so while ceph's
>> aggeregate performance is awesome, an individual single thread will not
>> be amazing. A given set of data will exist on all 3 nodes, and you will
>> hit 100% of nodes with any write. so by using ceph with 3 nodes you
>> give ceph the worst case for performance. eg
>> with 4 nodes a write would hit 75%, with 6 nodes it would hit 50% of the
>> cluster. you see where this is going...
>> But a single write will only hit one disk in 3 nodes, and will not have
>> a better performance then the disk it hits. you can cheat more
>> performance with rbd caching. and it is important for performance to get
>> a higher queue depth. afaik zfs uses a queue depth of 1, for ceph the
>> worst possible. you may have some success by buffering on one or both
>> ends of the transfer 
>> if the vm have a RBD disk, you may (or may not) benefit from rbd fancy
>> striping, since operations can hit more osd's in parallel.
>> good luck
>> Ronny Aasen
>>  http://docs.ceph.com/docs/master/architecture/#data-striping
>> pve-user mailing list
>> pve-user at pve.proxmox.com
> pve-user mailing list
> pve-user at pve.proxmox.com
More information about the pve-user