[pbs-devel] [PATCH v8 proxmox-backup 58/69] docs: add section describing change detection mode
Fabian Grünbichler
f.gruenbichler at proxmox.com
Tue Jun 4 14:07:55 CEST 2024
On May 28, 2024 11:42 am, Christian Ebner wrote:
> Describe the motivation and basic principle of the clients change
> detection mode and show an example invocation.
>
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
> changes since version 7:
> - no changes
>
> changes since version 6:
> - add more information on metadata being compared
> - adapt and link from technical overview
>
> docs/backup-client.rst | 45 +++++++++++++++++++++++++++++++++++++
> docs/technical-overview.rst | 3 +++
> 2 files changed, 48 insertions(+)
>
> diff --git a/docs/backup-client.rst b/docs/backup-client.rst
> index 00a1abbb3..58fcd79f0 100644
> --- a/docs/backup-client.rst
> +++ b/docs/backup-client.rst
> @@ -280,6 +280,51 @@ Multiple paths can be excluded like this:
>
> # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
>
> +.. _client_change_detection_mode:
> +
> +Change Detection Mode
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +File-based backups containing a lot of data can take a long time, as the default
> +behavior for the Proxmox backup client is to read all data and re-encode it.
read all data and encode it into a pxar archive.
> +The encoded stream is split into variable sized chunks for efficient
> +deduplication and based on the chunk digest a decision can be made whether a
I think I'd drop the efficient deduplication, the whole point of this
section is that it is not that efficient :-P
is split into variable sized chunks. For each chunk, a digest is
calculated and used to decide whether the chunk needs ..
> +given chunk needs to be uploaded or can be indexed without upload as it is
> +already available on the server (and therefore deduplicated). For some
> +use-cases, where files do not change frequently the full re-reading is not
> +feasible and undesired.
If the backed up files are largely unchanged, re-reading and then
deciding the corresponding chunks don't need to be uploaded at all (..
something something undesired ;))
> +
> +The backup clients `change-detection-mode` can be switched from default to
client's
> +`metadata` based detection to reduce limitations as described above, instructing
> +the client to avoid re-reading files with unchanged metadata whenever possible.
> +When using this mode, instead of the regular pxar archive, the backup snapshot
> +is stored into two separate files: the `mpxar` containing the archives metadata
archive's
> +and the `ppxar` containing a concatenation of the file contents. This splitting
> +allows for metadata lookups without the overhead of the file contents.
for efficient metadata lookups. ?
> +Using the `change-detection-mode` set to `data` allows to create the same split
> +archive as when using the `metadata` mode, but without using a previous
> +reference and therefore reencoding all file payloads.
this part should move below, since the next paragraphs describe the
metadata mode?
> +
> +When creating the backup archives, the current file metadata is compared to the
> +one looked up in the previous `mpxar` archive.
> +The metadata comparison includes file size, file type, ownership and permission
> +information acls and attributes and most importantly the files mtime, for
something here is missing (a comma?), and s/files/file's/
> +details see the :ref:`pxar metadata archive format <pxar-meta-format>`.
> +
> +If unchanged, the entry is cached for possible re-use of content chunks without
> +re-reading, by indexing the already present chunks containing the contents from
> +the previous backup snapshot. Since the file might only partially re-use chunks
> +(thereby introducing wasted space in the form of padding), the decision whether
> +to re-use or re-encode the currently cached entries is delegated to when enough
is delayed/postponed
> +information is available, comparing the possible padding a threshold value.
to a
> +
> +The following shows an example for the client invocation with the `metadata`
> +mode:
> +
> +.. code-block:: console
> +
> + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
> +
> .. _client_encryption:
>
> Encryption
> diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
> index 89835a7cc..a8b1c7268 100644
> --- a/docs/technical-overview.rst
> +++ b/docs/technical-overview.rst
> @@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes
>
> When uploading an index, the client first has to read the source data, chunk it
> and send the data as chunks with their identifying checksum to the server.
> +When using the :ref:`change detection mode <change_detection_mode>` payload
> +chunks for unchanged files are reused from the previous snapshot, thereby not
> +reading the source data again.
>
> If there is a previous Snapshot in the backup group, the client can first
> download the chunk list of the previous Snapshot. If it detects a chunk that
> --
> 2.39.2
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
More information about the pbs-devel
mailing list