[PVE-User] Ceph or Gluster
Lindsay Mathieson
lindsay.mathieson at gmail.com
Sat Apr 23 02:36:41 CEST 2016
On 23/04/2016 7:50 AM, Brian :: wrote:
> With NVME journals on a 3 node 4 OSD cluster
Well your hardware is rather better than mine :) I'm just using consumer
grade SSD's for journals which won't have anywhere near the performance
of NVME
> if I do a quick dd of a
> 1GB file on a VM I can see 2.34Gbps on the storage network straight
> away so if I was only using 1Gbps here the network would be a
> bottlekneck. If I perform the same in 2 VMs traffic hits 4.19Gbps on
> the storage network.
>
> The throughput in the VM is 1073741824 bytes (1.1 GB) copied, 3.43556
> s, 313 MB/s (R=3)
dd isn't really a good test of throughput, to easy for the kernel and
filesystem to optimise it. bonnie++ or even CrystalDiskMark (Windows VM)
would be interesting.
>
> Would be very interested in hearing more about your gluster setup.. I
> don't know anything about it - how many nodes are involved?
POOMA U summary:
redhat offer two cluster filesystems, ceph and gluster - gluster
actually predates ceph, though ceph definitely has more attention now.
http://gluster.org,
gluster replicates a file system directly, whereas ceph rbd is a pure
block based replication system (ignoring rgw etc). CephFS only reached
stable in the latest release, but rbd is a good match for block based VM
images. Like ceph, gluster has a direct block based interface for VM
images (gfapi) integrated with qemu which offers better performance than
fuse based filesystems.
One of the problem with gluster used to be its file based replication
and healing process - it had no way of tracking block changes, so when a
node was down and a large VM image was written to, it would have to scan
and compare the entire multi GB file for changes when the node came back
up. A none issue for ceph where block devices are stored in 4MB chunks
and it tracks which chunks have changed.
However in vs 3.7 gluster introduced sharded volumes where files are
stored in shards. shard size is configurable and defaults to 4MB. That
has brought gluster heal performance and resource usage in into the same
league as ceph, though ceph is still slightly faster I think.
One huge problem I've noticed with ceph is snapshot speed. For me via
proxmox, ceph rbd live snapshots were unusably slow. Sluggish to take,
but rolling back a snapshot would take literally hours. Same problem
with restoring backups. Deal breaker for me. Gluster can use qcow2
images and snapshot rollbacks would take a couple of minutes at worst.
My hardware setup:
3 Proxmox modes, VM's and ceph/gluster on all 3.
Node 1:
- Xeon E5-2620
- 64GB RAM
- ZFS RAID10
- SSD log & cache
- 4 * 3TB WD Red
- 3 * 1GB Eth
Node 2:
- 2 * Xeon E5-2660
- 64GB RAM
- ZFS RAID10
- SSD log & cache
- 4 * 3TB WD Red
- 3 * 1GB Eth
Node 3:
- Xeon E5-2620
- 64GB RAM
- ZFS RAID10,
- SSD log & cache
- 6 * 600GB Velocoraptor
- 2 * 3TB WD Red
- 2 * 1GB Eth
Originally ceph had all the disks to itself (xfs underneath), now ceph
and gluster are both now running off ZFS pools while I evaluate gluster.
Currently half the VM's are running off gluster. Not ideal as there is a
certain amount of overhead in running both.
gluster - basically the same overall setup as ceph:
- replica 3
- 64MB shard size
- caching etc is all handled by ZFS
Crucial things for me:
- stability. Does it crash a lot :)
- Robustness, how well does it cope with node crashes, network outages etc
- performance - raw speed and IOPS
- snapshots. How easy is it to snapshot and rollback VM's. Not an issue
for eveyone, but we run a lot of dev and testing VM's where easy access
to multiple snaphots is important.
- backups. How easy to backup and *restore*.
cheers,
--
Lindsay Mathieson
More information about the pve-user
mailing list