[pve-devel] Speed up PVE Backup
Eneko Lacunza
elacunza at binovo.es
Wed Jul 20 12:37:43 CEST 2016
Hi again,
I've been looking around the backup/restore code a bit. I'm focused on
restore acceleration on Ceph RBD right know.
Sorry if I have something mistaken, I have never developed for Proxmox/Qemu.
I see in line 563 of file
https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=debian/patches/pve/0011-introduce-new-vma-archive-format.patch;h=1c26209648c210f3b18576abc2c5a23768fd7c7b;hb=HEAD
the function restore_write_data, it is calling full_write (for direct to
file restore) and bdrv_write (what I suppose is a QEMU abstraction of
block device).
This is called from restore_extents, where a comment precisely says "try
to write whole clusters to speedup restore", so this means we're writing
64KB-8Byte chunks, which is giving a hard time to Ceph-RBD because this
means lots of ~64KB IOPS.
So, I suggest the following solution to your consideration:
- Create a write buffer on startup (let's asume it's 4MB for example, a
number ceph rbd would like much more than 64KB). This could even be
configurable and skip the buffer altogether if buffer_size=cluster_size
- Wrap current "restore_write_data" with a
"restore_write_data_with_buffer", that does a copy to the 4MB buffer,
and only calls "restore_write_data" when it's full.
* Create a new "flush_restore_write_data_buffer" to flush the write
buffer when device restore reading is complete.
Do you think this is a good idea? If so I will find time to implement
and test this to check whether restore time improves.
Thanks a lot
Eneko
El 20/07/16 a las 08:24, Eneko Lacunza escribió:
> El 16/02/16 a las 15:52, Stefan Priebe - Profihost AG escribió:
>> Am 16.02.2016 um 15:50 schrieb Dmitry Petuhov:
>>> 16.02.2016 13:20, Dietmar Maurer wrote:
>>>>> Storage Backend is ceph using 2x 10Gbit/s and i'm able to read
>>>>> from it
>>>>> with 500-1500MB/s. See below for an example.
>>>> The backup process reads 64KB blocks, and it seems this slows down
>>>> ceph.
>>>> This is a known behavior, but I found no solution to speed it up.
>>> Just done script to speedup my backups from ceph. It's simply does
>>> (actually little more):
>>> rbd snap create $SNAP
>>> rbd export $SNAP $DUMPDIR/$POOL-$VOLUME-$DATE.raw
>>> rbd snap rm $SNAP
>>> for every image in selected pools.
>>>
>>> When exporting to file, it's faster than my temporary HDD can write
>>> (about 120MB/s). But exporting to STDOUT ('-' instead of filename, with
>>> compression or without it) noticeably decreases speed to qemu's levels
>>> (20-30MB/s). That's little strange.
>>>
>>> This method is incompatible with PVE's backup-restore tools, but good
>>> enough for manual disaster recovery from CLI.
>> right - that'S working for me too but just at night and not when a
>> single user wants RIGHT now a backup incl. config.
> Do we have any improvement related to this in the pipeline? Yesterday
> our 9-osd 3-node cluster restored a backup at 6MB/s... it was very
> boring, painfull and expensive to wait for it :) (I decided to buy a
> new server to replace our 7.5-year IBM while waiting ;) )
>
> Our backups are slow too, but we do those during weekend... but
> usually we want to restore fast... :)
>
> Dietmar, I haven't looked at the backup/restore code, but do you
> think we could do something to read/write to storage in larger chunks
> than the actual 64KB? I'm out of a high work load period and maybe
> could look at this issue this summer.
>
> Thanks
> Eneko
>
--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943493611
943324914
Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
More information about the pve-devel
mailing list