[pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup
Christian Ebner
c.ebner at proxmox.com
Tue Sep 26 09:15:50 CEST 2023
Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters.
> On 22.09.2023 09:16 CEST Christian Ebner <c.ebner at proxmox.com> wrote:
>
>
> This (still rather rough) series of patches prototypes a possible
> approach to improve the pxar file level backup creation speed.
> The series is intended to get a first feedback on the implementation
> approach and to find possible pitfalls I might not be aware of.
>
> The current approach is to skip encoding of regular file payloads,
> for which metadata (currently mtime and size) did not change as
> compared to a previous backup run. Instead of re-encoding the files, a
> reference to a newly introduced appendix section of the pxar archive
> will be written. The appenidx section will be created as concatination
> of indexed chunks from the previous backup run, thereby containing the
> sequential file payload at a calculated offset with respect to the
> starting point of the appendix section.
>
> Metadata comparison and caclulation of the chunks to be indexed for the
> appendix section is performed using the catalog of a previous backup as
> reference. In order to be able to calculate the offsets, the current
> catalog format is extended to include the file offset with respect to
> the pxar archive byte stream. This allows to find the required chunks
> indexes, the start padding within the concatenated chunks and the total
> bytes introduced by the chunks.
>
> During encoding, the chunks needed for the appendix section are injected
> in the pxar archive after forcing a chunk boundary when regular pxar
> encoding is finished. Finally, the pxar archive containing an appenidx
> section are marked as such by appending a final pxar goodbye lookup
> table only containing the offset to the appendix section start and total
> size of that section, needed for random access as e.g. for mounting the
> archive via the fuse filesystem implementation.
>
> Currently, the code assumes the reference backup (for which the previous
> run is used) to be a regular backup without appendix section, and the
> catalog for that backup to already contain the required additional
> offset information.
>
> An invocation therefore looks lile:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path>
> proxmox-backup-client backup <label>.pxar:<source-path> --incremental
> ```
>
> pxar:
>
> Christian Ebner (8):
> fix #3174: encoder: impl fn new for LinkOffset
> fix #3174: decoder: factor out skip_bytes from skip_entry
> fix #3174: decoder: impl skip_bytes for sync dec
> fix #3174: metadata: impl fn to calc byte size
> fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
> fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
> fix #3174: encoder: add helper to incr encoder pos
> fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
>
> examples/mk-format-hashes.rs | 11 +++++
> examples/pxarcmd.rs | 4 +-
> src/accessor/mod.rs | 46 ++++++++++++++++++++
> src/decoder/mod.rs | 38 +++++++++++++---
> src/decoder/sync.rs | 6 +++
> src/encoder/aio.rs | 36 ++++++++++++++--
> src/encoder/mod.rs | 84 +++++++++++++++++++++++++++++++++++-
> src/encoder/sync.rs | 32 +++++++++++++-
> src/format/mod.rs | 16 +++++++
> src/lib.rs | 54 +++++++++++++++++++++++
> 10 files changed, 312 insertions(+), 15 deletions(-)
>
> proxmox-backup:
>
> Christian Ebner (12):
> fix #3174: index: add fn index list from start/end-offsets
> fix #3174: index: add fn digest for DynamicEntry
> fix #3174: api: double catalog upload size
> fix #3174: catalog: incl pxar archives file offset
> fix #3174: archiver/extractor: impl appendix ref
> fix #3174: extractor: impl seq restore from appendix
> fix #3174: archiver: store ref to previous backup
> fix #3174: upload stream: impl reused chunk injector
> fix #3174: chunker: add forced boundaries
> fix #3174: backup writer: inject queued chunk in upload steam
> fix #3174: archiver: reuse files with unchanged metadata
> fix #3174: client: Add incremental flag to backup creation
>
> examples/test_chunk_speed2.rs | 9 +-
> pbs-client/src/backup_writer.rs | 88 ++++---
> pbs-client/src/chunk_stream.rs | 41 +++-
> pbs-client/src/inject_reused_chunks.rs | 123 ++++++++++
> pbs-client/src/lib.rs | 1 +
> pbs-client/src/pxar/create.rs | 217 ++++++++++++++++--
> pbs-client/src/pxar/extract.rs | 141 ++++++++++++
> pbs-client/src/pxar/mod.rs | 2 +-
> pbs-client/src/pxar/tools.rs | 9 +
> pbs-client/src/pxar_backup_stream.rs | 8 +-
> pbs-datastore/src/catalog.rs | 122 ++++++++--
> pbs-datastore/src/dynamic_index.rs | 38 +++
> proxmox-backup-client/src/main.rs | 142 +++++++++++-
> .../src/proxmox_restore_daemon/api.rs | 15 +-
> pxar-bin/src/main.rs | 22 +-
> src/api2/backup/upload_chunk.rs | 4 +-
> src/tape/file_formats/snapshot_archive.rs | 2 +-
> tests/catar.rs | 3 +
> 18 files changed, 886 insertions(+), 101 deletions(-)
> create mode 100644 pbs-client/src/inject_reused_chunks.rs
>
> --
> 2.39.2
More information about the pbs-devel
mailing list