[pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup

Christian Ebner c.ebner at proxmox.com
Tue Sep 26 09:15:50 CEST 2023


Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters.

> On 22.09.2023 09:16 CEST Christian Ebner <c.ebner at proxmox.com> wrote:
> 
>  
> This (still rather rough) series of patches prototypes a possible
> approach to improve the pxar file level backup creation speed.
> The series is intended to get a first feedback on the implementation
> approach and to find possible pitfalls I might not be aware of.
> 
> The current approach is to skip encoding of regular file payloads,
> for which metadata (currently mtime and size) did not change as
> compared to a previous backup run. Instead of re-encoding the files, a
> reference to a newly introduced appendix section of the pxar archive
> will be written. The appenidx section will be created as concatination
> of indexed chunks from the previous backup run, thereby containing the
> sequential file payload at a calculated offset with respect to the
> starting point of the appendix section.
> 
> Metadata comparison and caclulation of the chunks to be indexed for the
> appendix section is performed using the catalog of a previous backup as
> reference. In order to be able to calculate the offsets, the current
> catalog format is extended to include the file offset with respect to
> the pxar archive byte stream. This allows to find the required chunks
> indexes, the start padding within the concatenated chunks and the total
> bytes introduced by the chunks.
> 
> During encoding, the chunks needed for the appendix section are injected
> in the pxar archive after forcing a chunk boundary when regular pxar
> encoding is finished. Finally, the pxar archive containing an appenidx
> section are marked as such by appending a final pxar goodbye lookup
> table only containing the offset to the appendix section start and total
> size of that section, needed for random access as e.g. for mounting the
> archive via the fuse filesystem implementation.
> 
> Currently, the code assumes the reference backup (for which the previous
> run is used) to be a regular backup without appendix section, and the
> catalog for that backup to already contain the required additional
> offset information.
> 
> An invocation therefore looks lile:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path>
> proxmox-backup-client backup <label>.pxar:<source-path> --incremental
> ```
> 
> pxar:
> 
> Christian Ebner (8):
>   fix #3174: encoder: impl fn new for LinkOffset
>   fix #3174: decoder: factor out skip_bytes from skip_entry
>   fix #3174: decoder: impl skip_bytes for sync dec
>   fix #3174: metadata: impl fn to calc byte size
>   fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
>   fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
>   fix #3174: encoder: add helper to incr encoder pos
>   fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
> 
>  examples/mk-format-hashes.rs | 11 +++++
>  examples/pxarcmd.rs          |  4 +-
>  src/accessor/mod.rs          | 46 ++++++++++++++++++++
>  src/decoder/mod.rs           | 38 +++++++++++++---
>  src/decoder/sync.rs          |  6 +++
>  src/encoder/aio.rs           | 36 ++++++++++++++--
>  src/encoder/mod.rs           | 84 +++++++++++++++++++++++++++++++++++-
>  src/encoder/sync.rs          | 32 +++++++++++++-
>  src/format/mod.rs            | 16 +++++++
>  src/lib.rs                   | 54 +++++++++++++++++++++++
>  10 files changed, 312 insertions(+), 15 deletions(-)
> 
> proxmox-backup:
> 
> Christian Ebner (12):
>   fix #3174: index: add fn index list from start/end-offsets
>   fix #3174: index: add fn digest for DynamicEntry
>   fix #3174: api: double catalog upload size
>   fix #3174: catalog: incl pxar archives file offset
>   fix #3174: archiver/extractor: impl appendix ref
>   fix #3174: extractor: impl seq restore from appendix
>   fix #3174: archiver: store ref to previous backup
>   fix #3174: upload stream: impl reused chunk injector
>   fix #3174: chunker: add forced boundaries
>   fix #3174: backup writer: inject queued chunk in upload steam
>   fix #3174: archiver: reuse files with unchanged metadata
>   fix #3174: client: Add incremental flag to backup creation
> 
>  examples/test_chunk_speed2.rs                 |   9 +-
>  pbs-client/src/backup_writer.rs               |  88 ++++---
>  pbs-client/src/chunk_stream.rs                |  41 +++-
>  pbs-client/src/inject_reused_chunks.rs        | 123 ++++++++++
>  pbs-client/src/lib.rs                         |   1 +
>  pbs-client/src/pxar/create.rs                 | 217 ++++++++++++++++--
>  pbs-client/src/pxar/extract.rs                | 141 ++++++++++++
>  pbs-client/src/pxar/mod.rs                    |   2 +-
>  pbs-client/src/pxar/tools.rs                  |   9 +
>  pbs-client/src/pxar_backup_stream.rs          |   8 +-
>  pbs-datastore/src/catalog.rs                  | 122 ++++++++--
>  pbs-datastore/src/dynamic_index.rs            |  38 +++
>  proxmox-backup-client/src/main.rs             | 142 +++++++++++-
>  .../src/proxmox_restore_daemon/api.rs         |  15 +-
>  pxar-bin/src/main.rs                          |  22 +-
>  src/api2/backup/upload_chunk.rs               |   4 +-
>  src/tape/file_formats/snapshot_archive.rs     |   2 +-
>  tests/catar.rs                                |   3 +
>  18 files changed, 886 insertions(+), 101 deletions(-)
>  create mode 100644 pbs-client/src/inject_reused_chunks.rs
> 
> -- 
> 2.39.2





More information about the pbs-devel mailing list