[pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks

Oguz Bektas o.bektas at proxmox.com
Wed Dec 9 16:56:58 CET 2020


hi,

thanks for the patch!

some comments inline

On Wed, Dec 09, 2020 at 04:25:53PM +0100, Dominik Csapak wrote:
> adds explanations for:
> * what datastores are
> * their relation with snapshots/chunks
> * basic information about chunk directory structures
> * fixed-/dynamically-sized chunks
> * special handling of encrypted chunks
> * hash collision probability
> * limitation of file-based backups
> 
> Signed-off-by: Dominik Csapak <d.csapak at proxmox.com>
> ---
>  docs/index.rst              |   1 +
>  docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++
>  docs/terminology.rst        |   3 +
>  3 files changed, 156 insertions(+)
>  create mode 100644 docs/technical-overview.rst
> 
> diff --git a/docs/index.rst b/docs/index.rst
> index fffcb4fd..f3e6bf0c 100644
> --- a/docs/index.rst
> +++ b/docs/index.rst
> @@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License".
>     pve-integration.rst
>     pxar-tool.rst
>     sysadmin.rst
> +   technical-overview.rst
>     faq.rst
>  
>  .. raw:: latex
> diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
> new file mode 100644
> index 00000000..20f937bd
> --- /dev/null
> +++ b/docs/technical-overview.rst
> @@ -0,0 +1,152 @@
> +Technical Overview
> +==================
> +
> +.. _technical_overview:
> +
> +Datastores
> +----------
> +
> +A Datastore is the logical place where :ref:`Backup Snapshots <backup_snapshot>`
> +and their chunks are stored. Snapshots consist of a manifest, blobs,
> +dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure:
> +
> + <datastore-root>/<type>/<id>/<time>/
> +
> +The indexes contained in a snapshot reference chunks on which the deduplication
> +of datastores is based.

imo it's a bit confusing to read this sentence. maybe something like:

"The deduplication of datastores is based on chunks, which are
referenced by the indexes contained in a snapshot."

> +
> +Chunks
> +------
> +
> +A chunk is some (sometimes encrypted) data with a CRC-32 checksum at

s/sometimes/possibly ?

> +the end and a type marker at the beginning. It is identified by a
> +SHA-256 checksum of its content.
> +
> +To generate such chunks, backup data is split either into fixed-size or
> +dynamically-sized chunks. The same content will be hashed to the same
> +checksum.
> +
> +The chunks of a datastore are found in
> +
> + <datastore-root>/.chunks/
> +
> +This chunk directory is further subdivided by the first four byte of the
> +chunks checksum, so the chunk with the checksum
> +
> + a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
> +
> +lives in
> +
> + <datastore-root>/.chunks/a342/
> +
> +This is done to reduce the number of files per directory, as having
> +many files per directory can be bad for file system performance.
> +
> +These chunk directories('0000'-'ffff') will be preallocated when the datastore is

s/the/a

> +created.
> +
> +Fixed-sized Chunks
> +^^^^^^^^^^^^^^^^^^
> +
> +For block based backups (like VMs), fixed-sized chunks are used. The content
> +(disk image), is split into chunks of the same length (typically 4 MiB).
> +
> +This works very well for VM images, since the file system on the guest,
> +most often tries to allocate files in contiguous pieces, so new files get
> +new blocks, and changing existing files changes only their own blocks.

s/guest,/guest


> +
> +As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps',
> +which can track the changed blocks of an image. Since these bitmap
> +are also a representation of the image split into chunks, we have
> +a direct relation between dirty blocks of the image and chunks we have
> +to upload, so only touched chunks of the disk have to be uploaded for a backup.

s/touched/modified ?

> +
> +Dynamically-sized Chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If does not want to backup block-based systems but file-based systems,

s/If/If one

> +using fixed-sized chunks is not a good idea, since every time a file
> +would change in size, the remaining data gets shifted around and this
> +would result in many chunks changing, reducing the effect of deduplication.
> +
> +To improve this, `Proxmox Backup`_ Server uses dynamically-sized chunks
> +instead. Instead of splitting an image into fixed sizes, it generates
> +a consistent file archive and uses a rolling hash over the data
> +to calculate chunk boundaries. We use a variant of Buzhash which
> +is a cyclic polynomial algorithm.
> +
> +It works by continuously calculating a hashsum while iterating over the
s/hashsum/checksum
> +data, and on certain conditions it triggers a hash boundary.
> +
> +Assuming that there is not a change in every file of the to system to
s/not a/no
s/to//
> +be backed up, eventually the algorithm triggers the boundary on the
> +same data as a previous backup, resulting in a chunk that can be reused.
> +
> +Encrypted Chunks
> +^^^^^^^^^^^^^^^^
> +
> +A special case are encrypted chunks. Both fixed- and dynamically-sized
> +chunks can be encrypted, and their handling is slightly different
> +from normal chunks.
> +
> +The hash of encrypted chunks are calculated not with the actual (encrypted)
> +chunk content, but with the plaintext content concatenated with
> +the encryption key. This way, two chunks of the same data encrypted with
> +different keys generate two different checksums and no collisions occur for
> +multiple encryption keys.
> +
> +This is done to speed up the client part of the backup, since it only needs
> +to encrypt chunks that are actually getting uploaded and chunks that exist
> +already in the previous backup, do not need to be encrypted and uploaded.
> +
> +Caveats and Limitations
> +-----------------------
> +
> +Notes on hash collisions
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +Every hash has a chance to produce collisions, meaning two (or more) inputs
> +generating the same hashsum. For SHA-256, this chance is negligible.
s/hashsum/checksum
> +To calculate such a collision, one can use the ideas of the 'birthday problem'
> +from probability theory. For big numbers, this is actually not feasible to
s/not feasible/infeasible
> +calculate with normal computers, but there is a good approximation:
> +
> +.. math::
> +
> + p(n, d) = 1 - e^{-n^2/(2d)}
> +
> +Where `n` is the number of tries, and `d` is the number of possibilities.
> +So for example, if we assume a large datastore of 1 PiB, and an average chunk
> +size of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
> +possibilities.  Using the above formula we get that the probability of a
> +collision in that scenario is:
> +
> +.. math::
> +
> + 3.1115 * 10^{-61}
> +
> +For context, in a lottery game of 6 of 45, the chance to correctly guess all
> +6 numbers is only :math:`1.2277 * 10^{-7}`.
> +
> +So it is extremely unlikely that such a collision would occur by accident
> +in a normal datastore.
> +
> +Additionally, SHA-256 is prone to length extension attacks, but since
> +there is an upper limit for how big the chunk sizes are, this is not
> +problem, since a potential attacker cannot arbitrarily add the content.
s/problem/a problem

or

s/problem/problematic
> +
> +
> +File-based Backup
> +^^^^^^^^^^^^^^^^^
> +
> +Since there is no relation between files and dynamically-sized chunks,
> +we have to read all files again in every backup. Otherwise it would not be
> +possible to generate a format where the original chunks can be reused.
> +
> +Verification of encrypted chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +For encrypted chunks, only the checksum of the original (plaintext) data
> +is available, making it impossible for the server (without the encryption key),
> +to verify its content against it. Instead only the CRC-32 checksum gets checked.
> +
> +
> diff --git a/docs/terminology.rst b/docs/terminology.rst
> index 89ec7646..3eff9780 100644
> --- a/docs/terminology.rst
> +++ b/docs/terminology.rst
> @@ -1,3 +1,5 @@
> +.. _terminology:
> +
>  Terminology
>  ===========
>  
> @@ -99,6 +101,7 @@ Backup Group
>  The tuple ``<type>/<ID>`` is called a backup group. Such a group
>  may contain one or more backup snapshots.
>  
> +.. _backup_snapshot:
>  
>  Backup Snapshot
>  ---------------
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 





More information about the pbs-devel mailing list