[pbs-devel] [PATCH v6 proxmox-backup 54/65] docs: add section describing change detection mode
Dominik Csapak
d.csapak at proxmox.com
Thu May 23 11:28:42 CEST 2024
two comments here,
* i'd like for the docs to go a bit more into detail what the metadata
*is* (or link to a section where it's explained, e.g. in the mpxar format?)
because metadata can be mtime,size,inode,ctime,etc. and e.g. in borg backup
you can even choose which you want
* the 'technical overview' part still mentions that all data has to be read
so a short mention of the change detection mode with link here would be good
On 5/14/24 12:34, Christian Ebner wrote:
> Describe the motivation and basic principle of the clients change
> detection mode and show an example invocation.
>
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
> docs/backup-client.rst | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/docs/backup-client.rst b/docs/backup-client.rst
> index 00a1abbb3..e48b5dd60 100644
> --- a/docs/backup-client.rst
> +++ b/docs/backup-client.rst
> @@ -280,6 +280,47 @@ Multiple paths can be excluded like this:
>
> # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
>
> +.. _client_change_detection_mode:
> +
> +Change detection mode
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +File-based backups containing a lot of data can take a long time, as the default
> +behavior for the Proxmox backup client is to read all data and re-encode it.
> +The encoded stream is split into variable sized chunks for efficient
> +deduplication and based on the chunk digest a decision can be made whether a
> +given chunk needs to be uploaded or can be indexed without upload as it is
> +already available on the server (and therefore deduplicated). For some
> +use-cases, where files do not change frequently the full re-reading is not
> +feasible and undesired.
> +
> +The backup clients `change-detection-mode` can be switched from default to
> +`metadata` based detection to reduce limitations as described above, instructing
> +the client to avoid re-reading files with unchanged metadata whenever possible.
> +When using this mode, instead of the regular pxar archive, the backup snapshot
> +is stored into two separate files: the `mpxar` containing the archives metadata
> +and the `ppxar` containing a concatenation of the file contents. This splitting
> +allows for metadata lookups without the overhead of the file contents.
> +Using the `change-detection-mode` set to `data` allows to create the same split
> +archive as when using the `metadata` mode, but without using a previous
> +reference and therefore reencoding all file payloads.
> +
> +When creating the backup archives, the current file metadata is compared to the
> +one looked up in the previous `mpxar` archive, and if unchanged the entry cached
> +for possible re-use of content chunks without re-reading, by indexing the
> +already present chunks containing the contents from the previous backup
> +snapshot. Since the file might only partially re-use chunks (thereby introducing
> +wasted space in the form of padding), the decision whether to re-use or
> +re-encode the currently cached entries is delegated to when enough information
> +is available, comparing the possible padding a threshold value.
> +
> +The following shows an example for the client invocation with the `metadata`
> +mode:
> +
> +.. code-block:: console
> +
> + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
> +
> .. _client_encryption:
>
> Encryption
More information about the pbs-devel
mailing list