[PVE-User] Ceph + ZFS?
lindsay.mathieson at gmail.com
Wed Jun 1 23:46:50 CEST 2016
On 2/06/2016 6:10 AM, Jeremy McCoy wrote:
> I am new here and am working on designing a Proxmox cluster. Just wondering if
> anyone has tried doing Ceph on ZFS (instead of XFS on LVM or whatever pveceph
> sets up) and, if so, how you went about implementing it. I have 4 hosts that
> each have 1 spinner and 1 SSD to offer to the cluster.
Been there ... ceph does not like zfs, apparently COW filestems perform
badly for cephs workload. Its recommended against (as is ext4 btw).
ceph does not do well on small setups, with only one osd/disk per node
you will get *terrible* performance.
The 9.x versions of ceph have a latency bug that triggers under memory
pressure (common on compute/storage nodes) and 10.x IMO is even less
friendly to small setups. Also as you've probabluy noiced, its a
maintenance headache for small shops, especially when things go wrong.
> Are there any pitfalls to be aware of here? My goal is to mainly run LXC
> containers (plus a few KVM VMs) on distributed storage, and I was hoping to take
> advantage of ZFS's caching, compression, and data integrity features. I am also
> open to doing GlusterFS or something else, but it looked like Proxmox does not
> support LXC containers running on that yet.
Probably because lxc doesn't support the native gluster api (gfapi), I
imagine the same problem with ceph/rbd.
However there is also the gluster fuse mount that proxmox automatic
creates (/mnt/pve/<glusterid>), you should be able to set that up as
shared directory storage and use that with lxc. I'll have a test of that
myself later today.
gluster works pretty with with zfs, I get excellent performance, maxing
out my network for writes and much better iops than I was getting with
ceph. Enabling lz4 compression gave me a 33% saving on space with no
noticeable impact on performance.
If you can afford it I recommend using ZFS RAID1 or better yet, ZFS
RAID10 per node. Apart from the extra redundancy it had much better
read/write performance than a single disk. And its much easier to
replace a failed disk on a zfs mirror than it is to replace a failed
gluster or ceph node.
More information about the pve-user