[pve-devel] Speed up PVE Backup

Eneko Lacunza elacunza at binovo.es
Thu Jul 21 13:19:10 CEST 2016


El 21/07/16 a las 09:34, Dietmar Maurer escribió:
>>> But you can try to assemble larger blocks, and write them once you get
>>> an out of order block...
>> Yes, this is the plan.
>>> I always thought the ceph libraries does (or should do) that anyways?
>>> (write combining)
>> Reading the docs:
>> http://docs.ceph.com/docs/hammer/rbd/rbd-config-ref/
>> It should be true when write-back rbd cache is activated. This seems to
>> be the default, but maybe we're using disk cache setting on restore too?
>> I'll try to change the disk cache setting and will report the results.
> thanks!
Looking at more docs:

This says:
QEMU’s cache settings override Ceph’s default settings (i.e., settings 
that are not explicitly set in the Ceph configuration file). If you 
explicitly set RBD Cache 
<http://docs.ceph.com/docs/hammer/rbd/rbd-config-ref/> settings in your 
Ceph configuration file, your Ceph settings override the QEMU cache 
settings. If you set cache settings on the QEMU command line, the QEMU 
command line settings override the Ceph configuration file settings.
I have been doing tests all morning with a different backup (only one 
10GB disk) so that I could perform tests faster.

I thought maybe we were restoring without writeback cache (rbd cache), 
but have tried the following ceph.conf tweaks and conclude that rbd 
cache is enabled:

1. If I set rbd cache = true I get the same performance.
2. If I set rbd cache writethrough until flush = true (rbd cache = true 
not necessary), I get x2-x3 the restore performance. This setting is a 
security measure for non-flushing virtio drivers, but it is safe for a 
restore. No writeback until a flush is detected

I think qmrestore isn't issuing any flush request (until maybe the end), 
so for ceph storage backend we should set 
rbd_cache_writethrough_until_flush=false for better performance.

Restore is happening at about 30-45MB/s vs 15MB/s before, but all this 
may be affected by a slow OSD, so I don't think my absolute figures are 
good, only the fact that there is a noticeable improvement. (we'll have 
this fixed next week).

If someone can test and confirm this, it should be quite easy to patch 



Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)

More information about the pve-devel mailing list