[PVE-User] Some operations wildly slow with ceph rbd

Lindsay Mathieson lindsay.mathieson at gmail.com
Mon Jan 5 15:52:50 CET 2015

I have my ceph cluster setup - 6 osd's on 2 ndoes with ssd journals. Its fast 
enough for my requirements - 140 MBS/s seq write, 500 MB/s seq read, IOPS are 
reasonable to.

I have 20 VM's running ver rbd, they perform quite well. responsive desktops 
and servers, no complaints there.

Where it all goes wrong is snapshots and backup restores.

Snapshots I've mentioned before, over 10 minutes to take a snapshot of a small 
VM is wrong. The actual disk snapshot is near instant, its the live state that 
takes all the time. The same VM only takes 30 seconds to snapshot on cephfs or 
NFS. The one rollback I tried to 40 min and was corrupted at the end.

Restores - I started a VM restore this evening, this is one I've done before 
on our crappy NAS nfs server - takes about 30  min.

2.5 *hours* later it reached 20% at that rate it would be 8 hours to finish.

I cancelled it and restarted the restore to cephfs (same storage). It takes it 
8 *minutes* to get to 20%

Oddly, a disk migration from cephfs to rbd is much faster - about 40min for 
the same virtual disk. Its literally much quicker to restore VM's to cephfs, 
then migrate the disk to ceph rbd.

Any idea why snapshot and backup restore operations are so slow on rbd? it is 
literally making the snapshot and backup function useless for my purposes. 
Surely I'm not the only one for whom this is a problem.
