[pve-devel] Improve container backup speed dramatically (factor 100-1000)

Fri Nov 20 09:27:27 CET 2020

hi,

it seems there are some misunderstandings as how the backup actually 
works, i'll try to clear that up

On 11/20/20 8:18 AM, Carsten Härle wrote:
>>> Yes, that is how the current variable sized chunking algorithm works.
> ...
> "zfs diff" does not provide the information needed for our deduplication
> algorithm, so we cannot use that.
> <<
> 
> 1) Can you please outline the algorithm?

we have 2 different chunking methods:

* fixed-sizes chunks
* dynamic-sized chunks

fixed-sized chunks, as the name implies, have a predefined, fixed size 
(e.g. 4M)
in vm backups we can split the disk image into such blocks and calculate
the hash

this works well in that case, since fs on disk tend to not
move data around, meaning if you change a byte in a file,
that one chunk will be different, but the rest will be the same

for dynamic sized chunks, we calculate what is called a 'rolling hash'[0]
over a window on the data and under certain circumstances, a chunk 
boundary is triggered, generating a chunk

neither of those chunking methods has any awareness or reference
to files

we use this for container backups in the following way

on iterating over the filesystem/directories, we generate
a so-called 'pxar' archive which is a streaming format
that contains metadata+data for a directory structure

while generating this data-stream we use the dynamic chunk
algorithm to generate chunks on that stream

this works well here, since if you modify/add a byte in a file,
all remaining data gets shifted over the rolling hash will
with a high degree of probabilty find a boundary again,
that it has before and the remaining chunks will be the same

> 2) Why you think, it is not possible to use the changed information of the file system?

1. we would like to avoid making features of the backup, storage
dependent

2. even if we would have that data, we'd have to completely read the 
stream of the previous backup to insert the changes in the right
position and generating a pxar stream that can be chunked.

but now we have read the whole tree again, but this time from
the backup server over the network (probably slower that local fs)
and possibly had to decrypt it (not necessary when reading again from 
local fs)

so with the current pxar+dynamic chunking, this is really not
feasible

what could be possible (but is much work) is
to create a new archive+chunking method, where
the relation files<->chunks is a bit more relevant,
but i'd guess this would blow up our indexing file size
(if you have a million small files, you'd have now a million
more chunks to reference, where as before there would be
less but bigger chunks that combined that data)

> 3) Why does differential backup work with VMs?

in vms there we can have a 'dirty bitmap' which
tracks which fixed-sized blocks was written to

since we split the disk image in the same chunk size
for the backup, there is a 1-to-1 mapping
of blocks written to, and blocks we have to backup

i hope this makes it clearer, if you have any questions, ideas,
etc. free to ask

later today/next week, i'll take the time to write
was i have written above into the documentation,
so that we have a single point of reference we can point to in
the future

kind regards
Dominik