[pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup

Christian Ebner c.ebner at proxmox.com
Fri Sep 22 09:16:01 CEST 2023

This (still rather rough) series of patches prototypes a possible
approach to improve the pxar file level backup creation speed.
The series is intended to get a first feedback on the implementation
approach and to find possible pitfalls I might not be aware of.

The current approach is to skip encoding of regular file payloads,
for which metadata (currently mtime and size) did not change as
compared to a previous backup run. Instead of re-encoding the files, a
reference to a newly introduced appendix section of the pxar archive
will be written. The appenidx section will be created as concatination
of indexed chunks from the previous backup run, thereby containing the
sequential file payload at a calculated offset with respect to the
starting point of the appendix section.

Metadata comparison and caclulation of the chunks to be indexed for the
appendix section is performed using the catalog of a previous backup as
reference. In order to be able to calculate the offsets, the current
catalog format is extended to include the file offset with respect to
the pxar archive byte stream. This allows to find the required chunks
indexes, the start padding within the concatenated chunks and the total
bytes introduced by the chunks.

During encoding, the chunks needed for the appendix section are injected
in the pxar archive after forcing a chunk boundary when regular pxar
encoding is finished. Finally, the pxar archive containing an appenidx
section are marked as such by appending a final pxar goodbye lookup
table only containing the offset to the appendix section start and total
size of that section, needed for random access as e.g. for mounting the
archive via the fuse filesystem implementation.

Currently, the code assumes the reference backup (for which the previous
run is used) to be a regular backup without appendix section, and the
catalog for that backup to already contain the required additional
offset information.

An invocation therefore looks lile:
proxmox-backup-client backup <label>.pxar:<source-path>
proxmox-backup-client backup <label>.pxar:<source-path> --incremental


Christian Ebner (8):
  fix #3174: encoder: impl fn new for LinkOffset
  fix #3174: decoder: factor out skip_bytes from skip_entry
  fix #3174: decoder: impl skip_bytes for sync dec
  fix #3174: metadata: impl fn to calc byte size
  fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
  fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
  fix #3174: encoder: add helper to incr encoder pos
  fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype

 examples/mk-format-hashes.rs | 11 +++++
 examples/pxarcmd.rs          |  4 +-
 src/accessor/mod.rs          | 46 ++++++++++++++++++++
 src/decoder/mod.rs           | 38 +++++++++++++---
 src/decoder/sync.rs          |  6 +++
 src/encoder/aio.rs           | 36 ++++++++++++++--
 src/encoder/mod.rs           | 84 +++++++++++++++++++++++++++++++++++-
 src/encoder/sync.rs          | 32 +++++++++++++-
 src/format/mod.rs            | 16 +++++++
 src/lib.rs                   | 54 +++++++++++++++++++++++
 10 files changed, 312 insertions(+), 15 deletions(-)


Christian Ebner (12):
  fix #3174: index: add fn index list from start/end-offsets
  fix #3174: index: add fn digest for DynamicEntry
  fix #3174: api: double catalog upload size
  fix #3174: catalog: incl pxar archives file offset
  fix #3174: archiver/extractor: impl appendix ref
  fix #3174: extractor: impl seq restore from appendix
  fix #3174: archiver: store ref to previous backup
  fix #3174: upload stream: impl reused chunk injector
  fix #3174: chunker: add forced boundaries
  fix #3174: backup writer: inject queued chunk in upload steam
  fix #3174: archiver: reuse files with unchanged metadata
  fix #3174: client: Add incremental flag to backup creation

 examples/test_chunk_speed2.rs                 |   9 +-
 pbs-client/src/backup_writer.rs               |  88 ++++---
 pbs-client/src/chunk_stream.rs                |  41 +++-
 pbs-client/src/inject_reused_chunks.rs        | 123 ++++++++++
 pbs-client/src/lib.rs                         |   1 +
 pbs-client/src/pxar/create.rs                 | 217 ++++++++++++++++--
 pbs-client/src/pxar/extract.rs                | 141 ++++++++++++
 pbs-client/src/pxar/mod.rs                    |   2 +-
 pbs-client/src/pxar/tools.rs                  |   9 +
 pbs-client/src/pxar_backup_stream.rs          |   8 +-
 pbs-datastore/src/catalog.rs                  | 122 ++++++++--
 pbs-datastore/src/dynamic_index.rs            |  38 +++
 proxmox-backup-client/src/main.rs             | 142 +++++++++++-
 .../src/proxmox_restore_daemon/api.rs         |  15 +-
 pxar-bin/src/main.rs                          |  22 +-
 src/api2/backup/upload_chunk.rs               |   4 +-
 src/tape/file_formats/snapshot_archive.rs     |   2 +-
 tests/catar.rs                                |   3 +
 18 files changed, 886 insertions(+), 101 deletions(-)
 create mode 100644 pbs-client/src/inject_reused_chunks.rs


More information about the pbs-devel mailing list