[pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup

Christian Ebner c.ebner at proxmox.com
Mon Apr 29 14:13:07 CEST 2024


On 3/28/24 13:36, Christian Ebner wrote:
> A big thank you to Dietmar and Fabian for the review of the previous
> version and Fabian for extensive testing and help during debugging.
> 
> This series of patches implements an metadata based file change
> detection mechanism for improved pxar file level backup creation speed
> for unchanged files.
> 
> The chosen approach is to split pxar archives on creation via the
> proxmox-backup-client into two separate data and upload streams,
> one exclusive for regular file payloads, the other one for the rest
> of the pxar archive, which is mostly metadata.
> 
> On consecutive runs, the metadata archive of the previous backup run,
> which is limited in size and therefore rapidly accessed is used to
> lookup and compare the metadata for entries to encode.
> This assumes that the connection speed to the Proxmox Backup Server is
> sufficiently fast, allowing the download and chaching of the chunks for
> that index.
> 
> Changes to regular files are detected by comparing all of the files
> metadata object, including mtime, acls, ecc. If no changes are detected,
> the previous payload index is used to lookup chunks to possibly re-use
> in the payload stream of the new archive.
> In order to reduce possible chunk fragmentation, the decision whether to
> re-use or re-encode a file payload is deferred until enough information
> is gathered by adding entries to a look-ahead cache. If the padding
> introduced by reusing chunks falls below a threshold, the entries are
> referenced, the chunks are re-used and injected into the pxar payload
> upload stream, otherwise they are discated and the files encoded
> regularly.
> 
> The following lists the most notable changes included in this series since
> the version 2:
> - many bugfixes regarding incorrect archive encoding by wrong offset
>    generation, adding additional sanity checks and rather fail on
>    encoding than produce an incorrectly encoded archive
> - different approach for deciding whether to re-use or re-encode the
>    entries. Previously, the entries have been encoded when a cached
>    payload size threshold was reached. Now, the padding introduced by
>    reusable chunks is tracked, and only if the padding does not exceed
>    the set threshold, the entries are re-used. This reduces the possible
>    padding, at the cost of re-encoding more entries. Also avoids to
>    re-use chunks which have now large padding holes because of
>    moved/removed files contained within.
> - added headers for metadata archive and payload file
> - added documentation
> 
> An invocation of a backup run with this patches now is:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
> ```
> During the first run, no reference index is available, the pxar archive
> will however be split into the two parts.
> Following backups will however utilize the pxar archive accessor and
> index files of the previous run to perform file change detection.
> 
> As benchmarks, the linux source code as well as the coco dataset for
> computer vision and pattern recognition can be used.
> The benchmarks can be performed by running:
> ```bash
> proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
> proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
> proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
> ```
> 
> Above command invocations assume the default repository and credentials
> to be set as environment variables, they might however be passed as
> additional optional parameters instead.
> 
> pxar:
> 
> Christian Ebner (14):
>    encoder: fix two typos in comments
>    format/examples: add PXAR_PAYLOAD_REF entry header
>    decoder: add method to read payload references
>    decoder: factor out skip part from skip_entry
>    encoder: add optional output writer for file payloads
>    encoder: move to stack based state tracking
>    decoder/accessor: add optional payload input stream
>    encoder: add payload reference capability
>    encoder: add payload position capability
>    encoder: add payload advance capability
>    encoder/format: finish payload stream with marker
>    format: add payload stream start marker
>    format: add pxar format version entry
>    format/encoder/decoder: add entry type cli params
> 
>   examples/apxar.rs            |   2 +-
>   examples/mk-format-hashes.rs |  21 ++
>   examples/pxarcmd.rs          |   7 +-
>   src/accessor/aio.rs          |  10 +-
>   src/accessor/mod.rs          |  52 +++-
>   src/accessor/sync.rs         |   8 +-
>   src/decoder/aio.rs           |  14 +-
>   src/decoder/mod.rs           | 191 ++++++++++++--
>   src/decoder/sync.rs          |  15 +-
>   src/encoder/aio.rs           |  87 +++++--
>   src/encoder/mod.rs           | 475 +++++++++++++++++++++++++----------
>   src/encoder/sync.rs          |  67 ++++-
>   src/format/mod.rs            |  63 +++++
>   src/lib.rs                   |   9 +
>   tests/simple/main.rs         |   3 +
>   15 files changed, 827 insertions(+), 197 deletions(-)
> 
> proxmox-backup:
> 
> Christian Ebner (44):
>    client: pxar: switch to stack based encoder state
>    client: backup writer: only borrow http client
>    client: backup: factor out extension from backup target
>    client: backup: early check for fixed index type
>    client: pxar: combine writer params into struct
>    client: backup: split payload to dedicated stream
>    client: helper: add helpers for creating reader instances
>    client: helper: add method for split archive name mapping
>    client: restore: read payload from dedicated index
>    tools: cover meta extension for pxar archives
>    restore: cover meta extension for pxar archives
>    client: mount: make split pxar archives mountable
>    api: datastore: refactor getting local chunk reader
>    api: datastore: attach optional payload chunk reader
>    catalog: shell: factor out pxar fuse reader instantiation
>    catalog: shell: redirect payload reader for split streams
>    www: cover meta extension for pxar archives
>    pxar: add optional payload input for achive restore
>    pxar: add more context to extraction error
>    client: pxar: include payload offset in output
>    pxar: show padding in debug output on archive list
>    datastore: dynamic index: add method to get digest
>    client: pxar: helper for lookup of reusable dynamic entries
>    upload stream: impl reused chunk injector
>    client: chunk stream: add struct to hold injection state
>    client: chunk stream: add dynamic entries injection queues
>    specs: add backup detection mode specification
>    client: implement prepare reference method
>    client: pxar: implement store to insert chunks on caching
>    client: pxar: add previous reference to archiver
>    client: pxar: add method for metadata comparison
>    pxar: caching: add look-ahead cache types
>    client: pxar: add look-ahead caching
>    fix #3174: client: pxar: enable caching and meta comparison
>    client: backup: increase average chunk size for metadata
>    client: backup writer: add injected chunk count to stats
>    pxar: create: show chunk injection stats debug output
>    client: pxar: add entry kind format version
>    client: pxar: opt encode cli exclude patterns as CliParams
>    client: pxar: add flow chart for metadata change detection
>    docs: describe file format for split payload files
>    docs: add section describing change detection mode
>    test-suite: add detection mode change benchmark
>    test-suite: add bin to deb, add shell completions
> 
>   Cargo.toml                                    |   1 +
>   Makefile                                      |  13 +-
>   debian/proxmox-backup-client.bash-completion  |   1 +
>   debian/proxmox-backup-client.install          |   2 +
>   debian/proxmox-backup-test-suite.bc           |   8 +
>   docs/backup-client.rst                        |  33 +
>   docs/file-formats.rst                         |  32 +
>   docs/meta-format-overview.dot                 |  50 ++
>   examples/test_chunk_speed2.rs                 |   2 +-
>   examples/upload-speed.rs                      |   2 +-
>   pbs-client/src/backup_specification.rs        |  40 +
>   pbs-client/src/backup_writer.rs               | 103 ++-
>   pbs-client/src/chunk_stream.rs                |  60 +-
>   pbs-client/src/inject_reused_chunks.rs        | 152 ++++
>   pbs-client/src/lib.rs                         |   3 +-
>   pbs-client/src/pxar/create.rs                 | 779 +++++++++++++++++-
>   pbs-client/src/pxar/extract.rs                |   2 +
>   ...t-metadata-based-file-change-detection.svg |   1 +
>   ...t-metadata-based-file-change-detection.txt |  12 +
>   pbs-client/src/pxar/look_ahead_cache.rs       |  38 +
>   pbs-client/src/pxar/mod.rs                    |   3 +-
>   pbs-client/src/pxar/tools.rs                  | 123 ++-
>   pbs-client/src/pxar_backup_stream.rs          |  57 +-
>   pbs-client/src/tools/mod.rs                   |   5 +-
>   pbs-datastore/src/dynamic_index.rs            |   5 +
>   pbs-pxar-fuse/src/lib.rs                      |   2 +-
>   proxmox-backup-client/src/benchmark.rs        |   2 +-
>   proxmox-backup-client/src/catalog.rs          |  42 +-
>   proxmox-backup-client/src/helper.rs           |  64 ++
>   proxmox-backup-client/src/main.rs             | 281 ++++++-
>   proxmox-backup-client/src/mount.rs            |  54 +-
>   proxmox-backup-test-suite/Cargo.toml          |  18 +
>   .../src/detection_mode_bench.rs               | 294 +++++++
>   proxmox-backup-test-suite/src/main.rs         |  17 +
>   proxmox-file-restore/src/main.rs              |  20 +-
>   .../src/proxmox_restore_daemon/api.rs         |  16 +-
>   pxar-bin/src/main.rs                          |  53 +-
>   src/api2/admin/datastore.rs                   |  47 +-
>   src/api2/tape/restore.rs                      |   4 +-
>   src/bin/proxmox_backup_debug/diff.rs          |   2 +-
>   src/tape/file_formats/snapshot_archive.rs     |   9 +-
>   tests/catar.rs                                |   4 +-
>   www/datastore/Content.js                      |   6 +-
>   zsh-completions/_proxmox-backup-test-suite    |  13 +
>   44 files changed, 2219 insertions(+), 256 deletions(-)
>   create mode 100644 debian/proxmox-backup-test-suite.bc
>   create mode 100644 docs/meta-format-overview.dot
>   create mode 100644 pbs-client/src/inject_reused_chunks.rs
>   create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.svg
>   create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.txt
>   create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
>   create mode 100644 proxmox-backup-client/src/helper.rs
>   create mode 100644 proxmox-backup-test-suite/Cargo.toml
>   create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
>   create mode 100644 proxmox-backup-test-suite/src/main.rs
>   create mode 100644 zsh-completions/_proxmox-backup-test-suite
> 
An updated version of the patch series is available 
https://lists.proxmox.com/pipermail/pbs-devel/2024-April/009104.html




More information about the pbs-devel mailing list