[pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup

Christian Ebner c.ebner at proxmox.com
Thu Mar 28 13:36:09 CET 2024


A big thank you to Dietmar and Fabian for the review of the previous
version and Fabian for extensive testing and help during debugging.

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate data and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.

On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.

Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision whether to
re-use or re-encode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If the padding
introduced by reusing chunks falls below a threshold, the entries are
referenced, the chunks are re-used and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.

The following lists the most notable changes included in this series since
the version 2:
- many bugfixes regarding incorrect archive encoding by wrong offset
  generation, adding additional sanity checks and rather fail on
  encoding than produce an incorrectly encoded archive
- different approach for deciding whether to re-use or re-encode the
  entries. Previously, the entries have been encoded when a cached
  payload size threshold was reached. Now, the padding introduced by
  reusable chunks is tracked, and only if the padding does not exceed
  the set threshold, the entries are re-used. This reduces the possible
  padding, at the cost of re-encoding more entries. Also avoids to
  re-use chunks which have now large padding holes because of
  moved/removed files contained within.
- added headers for metadata archive and payload file
- added documentation

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.

pxar:

Christian Ebner (14):
  encoder: fix two typos in comments
  format/examples: add PXAR_PAYLOAD_REF entry header
  decoder: add method to read payload references
  decoder: factor out skip part from skip_entry
  encoder: add optional output writer for file payloads
  encoder: move to stack based state tracking
  decoder/accessor: add optional payload input stream
  encoder: add payload reference capability
  encoder: add payload position capability
  encoder: add payload advance capability
  encoder/format: finish payload stream with marker
  format: add payload stream start marker
  format: add pxar format version entry
  format/encoder/decoder: add entry type cli params

 examples/apxar.rs            |   2 +-
 examples/mk-format-hashes.rs |  21 ++
 examples/pxarcmd.rs          |   7 +-
 src/accessor/aio.rs          |  10 +-
 src/accessor/mod.rs          |  52 +++-
 src/accessor/sync.rs         |   8 +-
 src/decoder/aio.rs           |  14 +-
 src/decoder/mod.rs           | 191 ++++++++++++--
 src/decoder/sync.rs          |  15 +-
 src/encoder/aio.rs           |  87 +++++--
 src/encoder/mod.rs           | 475 +++++++++++++++++++++++++----------
 src/encoder/sync.rs          |  67 ++++-
 src/format/mod.rs            |  63 +++++
 src/lib.rs                   |   9 +
 tests/simple/main.rs         |   3 +
 15 files changed, 827 insertions(+), 197 deletions(-)

proxmox-backup:

Christian Ebner (44):
  client: pxar: switch to stack based encoder state
  client: backup writer: only borrow http client
  client: backup: factor out extension from backup target
  client: backup: early check for fixed index type
  client: pxar: combine writer params into struct
  client: backup: split payload to dedicated stream
  client: helper: add helpers for creating reader instances
  client: helper: add method for split archive name mapping
  client: restore: read payload from dedicated index
  tools: cover meta extension for pxar archives
  restore: cover meta extension for pxar archives
  client: mount: make split pxar archives mountable
  api: datastore: refactor getting local chunk reader
  api: datastore: attach optional payload chunk reader
  catalog: shell: factor out pxar fuse reader instantiation
  catalog: shell: redirect payload reader for split streams
  www: cover meta extension for pxar archives
  pxar: add optional payload input for achive restore
  pxar: add more context to extraction error
  client: pxar: include payload offset in output
  pxar: show padding in debug output on archive list
  datastore: dynamic index: add method to get digest
  client: pxar: helper for lookup of reusable dynamic entries
  upload stream: impl reused chunk injector
  client: chunk stream: add struct to hold injection state
  client: chunk stream: add dynamic entries injection queues
  specs: add backup detection mode specification
  client: implement prepare reference method
  client: pxar: implement store to insert chunks on caching
  client: pxar: add previous reference to archiver
  client: pxar: add method for metadata comparison
  pxar: caching: add look-ahead cache types
  client: pxar: add look-ahead caching
  fix #3174: client: pxar: enable caching and meta comparison
  client: backup: increase average chunk size for metadata
  client: backup writer: add injected chunk count to stats
  pxar: create: show chunk injection stats debug output
  client: pxar: add entry kind format version
  client: pxar: opt encode cli exclude patterns as CliParams
  client: pxar: add flow chart for metadata change detection
  docs: describe file format for split payload files
  docs: add section describing change detection mode
  test-suite: add detection mode change benchmark
  test-suite: add bin to deb, add shell completions

 Cargo.toml                                    |   1 +
 Makefile                                      |  13 +-
 debian/proxmox-backup-client.bash-completion  |   1 +
 debian/proxmox-backup-client.install          |   2 +
 debian/proxmox-backup-test-suite.bc           |   8 +
 docs/backup-client.rst                        |  33 +
 docs/file-formats.rst                         |  32 +
 docs/meta-format-overview.dot                 |  50 ++
 examples/test_chunk_speed2.rs                 |   2 +-
 examples/upload-speed.rs                      |   2 +-
 pbs-client/src/backup_specification.rs        |  40 +
 pbs-client/src/backup_writer.rs               | 103 ++-
 pbs-client/src/chunk_stream.rs                |  60 +-
 pbs-client/src/inject_reused_chunks.rs        | 152 ++++
 pbs-client/src/lib.rs                         |   3 +-
 pbs-client/src/pxar/create.rs                 | 779 +++++++++++++++++-
 pbs-client/src/pxar/extract.rs                |   2 +
 ...t-metadata-based-file-change-detection.svg |   1 +
 ...t-metadata-based-file-change-detection.txt |  12 +
 pbs-client/src/pxar/look_ahead_cache.rs       |  38 +
 pbs-client/src/pxar/mod.rs                    |   3 +-
 pbs-client/src/pxar/tools.rs                  | 123 ++-
 pbs-client/src/pxar_backup_stream.rs          |  57 +-
 pbs-client/src/tools/mod.rs                   |   5 +-
 pbs-datastore/src/dynamic_index.rs            |   5 +
 pbs-pxar-fuse/src/lib.rs                      |   2 +-
 proxmox-backup-client/src/benchmark.rs        |   2 +-
 proxmox-backup-client/src/catalog.rs          |  42 +-
 proxmox-backup-client/src/helper.rs           |  64 ++
 proxmox-backup-client/src/main.rs             | 281 ++++++-
 proxmox-backup-client/src/mount.rs            |  54 +-
 proxmox-backup-test-suite/Cargo.toml          |  18 +
 .../src/detection_mode_bench.rs               | 294 +++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 proxmox-file-restore/src/main.rs              |  20 +-
 .../src/proxmox_restore_daemon/api.rs         |  16 +-
 pxar-bin/src/main.rs                          |  53 +-
 src/api2/admin/datastore.rs                   |  47 +-
 src/api2/tape/restore.rs                      |   4 +-
 src/bin/proxmox_backup_debug/diff.rs          |   2 +-
 src/tape/file_formats/snapshot_archive.rs     |   9 +-
 tests/catar.rs                                |   4 +-
 www/datastore/Content.js                      |   6 +-
 zsh-completions/_proxmox-backup-test-suite    |  13 +
 44 files changed, 2219 insertions(+), 256 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 docs/meta-format-overview.dot
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.svg
 create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.txt
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
 create mode 100644 proxmox-backup-client/src/helper.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2





More information about the pbs-devel mailing list