[pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup
Christian Ebner
c.ebner at proxmox.com
Mon Apr 29 14:13:07 CEST 2024
On 3/28/24 13:36, Christian Ebner wrote:
> A big thank you to Dietmar and Fabian for the review of the previous
> version and Fabian for extensive testing and help during debugging.
>
> This series of patches implements an metadata based file change
> detection mechanism for improved pxar file level backup creation speed
> for unchanged files.
>
> The chosen approach is to split pxar archives on creation via the
> proxmox-backup-client into two separate data and upload streams,
> one exclusive for regular file payloads, the other one for the rest
> of the pxar archive, which is mostly metadata.
>
> On consecutive runs, the metadata archive of the previous backup run,
> which is limited in size and therefore rapidly accessed is used to
> lookup and compare the metadata for entries to encode.
> This assumes that the connection speed to the Proxmox Backup Server is
> sufficiently fast, allowing the download and chaching of the chunks for
> that index.
>
> Changes to regular files are detected by comparing all of the files
> metadata object, including mtime, acls, ecc. If no changes are detected,
> the previous payload index is used to lookup chunks to possibly re-use
> in the payload stream of the new archive.
> In order to reduce possible chunk fragmentation, the decision whether to
> re-use or re-encode a file payload is deferred until enough information
> is gathered by adding entries to a look-ahead cache. If the padding
> introduced by reusing chunks falls below a threshold, the entries are
> referenced, the chunks are re-used and injected into the pxar payload
> upload stream, otherwise they are discated and the files encoded
> regularly.
>
> The following lists the most notable changes included in this series since
> the version 2:
> - many bugfixes regarding incorrect archive encoding by wrong offset
> generation, adding additional sanity checks and rather fail on
> encoding than produce an incorrectly encoded archive
> - different approach for deciding whether to re-use or re-encode the
> entries. Previously, the entries have been encoded when a cached
> payload size threshold was reached. Now, the padding introduced by
> reusable chunks is tracked, and only if the padding does not exceed
> the set threshold, the entries are re-used. This reduces the possible
> padding, at the cost of re-encoding more entries. Also avoids to
> re-use chunks which have now large padding holes because of
> moved/removed files contained within.
> - added headers for metadata archive and payload file
> - added documentation
>
> An invocation of a backup run with this patches now is:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
> ```
> During the first run, no reference index is available, the pxar archive
> will however be split into the two parts.
> Following backups will however utilize the pxar archive accessor and
> index files of the previous run to perform file change detection.
>
> As benchmarks, the linux source code as well as the coco dataset for
> computer vision and pattern recognition can be used.
> The benchmarks can be performed by running:
> ```bash
> proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
> proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
> proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
> ```
>
> Above command invocations assume the default repository and credentials
> to be set as environment variables, they might however be passed as
> additional optional parameters instead.
>
> pxar:
>
> Christian Ebner (14):
> encoder: fix two typos in comments
> format/examples: add PXAR_PAYLOAD_REF entry header
> decoder: add method to read payload references
> decoder: factor out skip part from skip_entry
> encoder: add optional output writer for file payloads
> encoder: move to stack based state tracking
> decoder/accessor: add optional payload input stream
> encoder: add payload reference capability
> encoder: add payload position capability
> encoder: add payload advance capability
> encoder/format: finish payload stream with marker
> format: add payload stream start marker
> format: add pxar format version entry
> format/encoder/decoder: add entry type cli params
>
> examples/apxar.rs | 2 +-
> examples/mk-format-hashes.rs | 21 ++
> examples/pxarcmd.rs | 7 +-
> src/accessor/aio.rs | 10 +-
> src/accessor/mod.rs | 52 +++-
> src/accessor/sync.rs | 8 +-
> src/decoder/aio.rs | 14 +-
> src/decoder/mod.rs | 191 ++++++++++++--
> src/decoder/sync.rs | 15 +-
> src/encoder/aio.rs | 87 +++++--
> src/encoder/mod.rs | 475 +++++++++++++++++++++++++----------
> src/encoder/sync.rs | 67 ++++-
> src/format/mod.rs | 63 +++++
> src/lib.rs | 9 +
> tests/simple/main.rs | 3 +
> 15 files changed, 827 insertions(+), 197 deletions(-)
>
> proxmox-backup:
>
> Christian Ebner (44):
> client: pxar: switch to stack based encoder state
> client: backup writer: only borrow http client
> client: backup: factor out extension from backup target
> client: backup: early check for fixed index type
> client: pxar: combine writer params into struct
> client: backup: split payload to dedicated stream
> client: helper: add helpers for creating reader instances
> client: helper: add method for split archive name mapping
> client: restore: read payload from dedicated index
> tools: cover meta extension for pxar archives
> restore: cover meta extension for pxar archives
> client: mount: make split pxar archives mountable
> api: datastore: refactor getting local chunk reader
> api: datastore: attach optional payload chunk reader
> catalog: shell: factor out pxar fuse reader instantiation
> catalog: shell: redirect payload reader for split streams
> www: cover meta extension for pxar archives
> pxar: add optional payload input for achive restore
> pxar: add more context to extraction error
> client: pxar: include payload offset in output
> pxar: show padding in debug output on archive list
> datastore: dynamic index: add method to get digest
> client: pxar: helper for lookup of reusable dynamic entries
> upload stream: impl reused chunk injector
> client: chunk stream: add struct to hold injection state
> client: chunk stream: add dynamic entries injection queues
> specs: add backup detection mode specification
> client: implement prepare reference method
> client: pxar: implement store to insert chunks on caching
> client: pxar: add previous reference to archiver
> client: pxar: add method for metadata comparison
> pxar: caching: add look-ahead cache types
> client: pxar: add look-ahead caching
> fix #3174: client: pxar: enable caching and meta comparison
> client: backup: increase average chunk size for metadata
> client: backup writer: add injected chunk count to stats
> pxar: create: show chunk injection stats debug output
> client: pxar: add entry kind format version
> client: pxar: opt encode cli exclude patterns as CliParams
> client: pxar: add flow chart for metadata change detection
> docs: describe file format for split payload files
> docs: add section describing change detection mode
> test-suite: add detection mode change benchmark
> test-suite: add bin to deb, add shell completions
>
> Cargo.toml | 1 +
> Makefile | 13 +-
> debian/proxmox-backup-client.bash-completion | 1 +
> debian/proxmox-backup-client.install | 2 +
> debian/proxmox-backup-test-suite.bc | 8 +
> docs/backup-client.rst | 33 +
> docs/file-formats.rst | 32 +
> docs/meta-format-overview.dot | 50 ++
> examples/test_chunk_speed2.rs | 2 +-
> examples/upload-speed.rs | 2 +-
> pbs-client/src/backup_specification.rs | 40 +
> pbs-client/src/backup_writer.rs | 103 ++-
> pbs-client/src/chunk_stream.rs | 60 +-
> pbs-client/src/inject_reused_chunks.rs | 152 ++++
> pbs-client/src/lib.rs | 3 +-
> pbs-client/src/pxar/create.rs | 779 +++++++++++++++++-
> pbs-client/src/pxar/extract.rs | 2 +
> ...t-metadata-based-file-change-detection.svg | 1 +
> ...t-metadata-based-file-change-detection.txt | 12 +
> pbs-client/src/pxar/look_ahead_cache.rs | 38 +
> pbs-client/src/pxar/mod.rs | 3 +-
> pbs-client/src/pxar/tools.rs | 123 ++-
> pbs-client/src/pxar_backup_stream.rs | 57 +-
> pbs-client/src/tools/mod.rs | 5 +-
> pbs-datastore/src/dynamic_index.rs | 5 +
> pbs-pxar-fuse/src/lib.rs | 2 +-
> proxmox-backup-client/src/benchmark.rs | 2 +-
> proxmox-backup-client/src/catalog.rs | 42 +-
> proxmox-backup-client/src/helper.rs | 64 ++
> proxmox-backup-client/src/main.rs | 281 ++++++-
> proxmox-backup-client/src/mount.rs | 54 +-
> proxmox-backup-test-suite/Cargo.toml | 18 +
> .../src/detection_mode_bench.rs | 294 +++++++
> proxmox-backup-test-suite/src/main.rs | 17 +
> proxmox-file-restore/src/main.rs | 20 +-
> .../src/proxmox_restore_daemon/api.rs | 16 +-
> pxar-bin/src/main.rs | 53 +-
> src/api2/admin/datastore.rs | 47 +-
> src/api2/tape/restore.rs | 4 +-
> src/bin/proxmox_backup_debug/diff.rs | 2 +-
> src/tape/file_formats/snapshot_archive.rs | 9 +-
> tests/catar.rs | 4 +-
> www/datastore/Content.js | 6 +-
> zsh-completions/_proxmox-backup-test-suite | 13 +
> 44 files changed, 2219 insertions(+), 256 deletions(-)
> create mode 100644 debian/proxmox-backup-test-suite.bc
> create mode 100644 docs/meta-format-overview.dot
> create mode 100644 pbs-client/src/inject_reused_chunks.rs
> create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.svg
> create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.txt
> create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
> create mode 100644 proxmox-backup-client/src/helper.rs
> create mode 100644 proxmox-backup-test-suite/Cargo.toml
> create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
> create mode 100644 proxmox-backup-test-suite/src/main.rs
> create mode 100644 zsh-completions/_proxmox-backup-test-suite
>
An updated version of the patch series is available
https://lists.proxmox.com/pipermail/pbs-devel/2024-April/009104.html
More information about the pbs-devel
mailing list