[pbs-devel] [PATCH v8 proxmox-backup 58/69] docs: add section describing change detection mode

Fabian Grünbichler f.gruenbichler at proxmox.com
Tue Jun 4 14:07:55 CEST 2024


On May 28, 2024 11:42 am, Christian Ebner wrote:
> Describe the motivation and basic principle of the clients change
> detection mode and show an example invocation.
> 
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
> changes since version 7:
> - no changes
> 
> changes since version 6:
> - add more information on metadata being compared
> - adapt and link from technical overview
> 
>  docs/backup-client.rst      | 45 +++++++++++++++++++++++++++++++++++++
>  docs/technical-overview.rst |  3 +++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/docs/backup-client.rst b/docs/backup-client.rst
> index 00a1abbb3..58fcd79f0 100644
> --- a/docs/backup-client.rst
> +++ b/docs/backup-client.rst
> @@ -280,6 +280,51 @@ Multiple paths can be excluded like this:
>  
>      # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
>  
> +.. _client_change_detection_mode:
> +
> +Change Detection Mode
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +File-based backups containing a lot of data can take a long time, as the default
> +behavior for the Proxmox backup client is to read all data and re-encode it.

read all data and encode it into a pxar archive.

> +The encoded stream is split into variable sized chunks for efficient
> +deduplication and based on the chunk digest a decision can be made whether a

I think I'd drop the efficient deduplication, the whole point of this
section is that it is not that efficient :-P

is split into variable sized chunks. For each chunk, a digest is
calculated and used to decide whether the chunk needs ..

> +given chunk needs to be uploaded or can be indexed without upload as it is
> +already available on the server (and therefore deduplicated). For some
> +use-cases, where files do not change frequently the full re-reading is not
> +feasible and undesired.

If the backed up files are largely unchanged, re-reading and then
deciding the corresponding chunks don't need to be uploaded at all (..
something something undesired ;))

> +
> +The backup clients `change-detection-mode` can be switched from default to

client's

> +`metadata` based detection to reduce limitations as described above, instructing
> +the client to avoid re-reading files with unchanged metadata whenever possible.
> +When using this mode, instead of the regular pxar archive, the backup snapshot
> +is stored into two separate files: the `mpxar` containing the archives metadata

archive's

> +and the `ppxar` containing a concatenation of the file contents. This splitting
> +allows for metadata lookups without the overhead of the file contents.

for efficient metadata lookups. ?

> +Using the `change-detection-mode` set to `data` allows to create the same split
> +archive as when using the `metadata` mode, but without using a previous
> +reference and therefore reencoding all file payloads.

this part should move below, since the next paragraphs describe the
metadata mode?

> +
> +When creating the backup archives, the current file metadata is compared to the
> +one looked up in the previous `mpxar` archive.
> +The metadata comparison includes file size, file type, ownership and permission
> +information acls and attributes and most importantly the files mtime, for

something here is missing (a comma?), and s/files/file's/

> +details see the :ref:`pxar metadata archive format <pxar-meta-format>`.
> +
> +If unchanged, the entry is cached for possible re-use of content chunks without
> +re-reading, by indexing the already present chunks containing the contents from
> +the previous backup snapshot. Since the file might only partially re-use chunks
> +(thereby introducing wasted space in the form of padding), the decision whether
> +to re-use or re-encode the currently cached entries is delegated to when enough

is delayed/postponed

> +information is available, comparing the possible padding a threshold value.

to a

> +
> +The following shows an example for the client invocation with the `metadata`
> +mode:
> +
> +.. code-block:: console
> +
> +    # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
> +
>  .. _client_encryption:
>  
>  Encryption
> diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
> index 89835a7cc..a8b1c7268 100644
> --- a/docs/technical-overview.rst
> +++ b/docs/technical-overview.rst
> @@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes
>  
>  When uploading an index, the client first has to read the source data, chunk it
>  and send the data as chunks with their identifying checksum to the server.
> +When using the :ref:`change detection mode <change_detection_mode>` payload
> +chunks for unchanged files are reused from the previous snapshot, thereby not
> +reading the source data again.
>  
>  If there is a previous Snapshot in the backup group, the client can first
>  download the chunk list of the previous Snapshot. If it detects a chunk that
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




More information about the pbs-devel mailing list