[pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Apr 5 13:39:47 CEST 2024


Quoting Christian Ebner (2024-03-28 13:36:09)
> A big thank you to Dietmar and Fabian for the review of the previous
> version and Fabian for extensive testing and help during debugging.
> 
> This series of patches implements an metadata based file change
> detection mechanism for improved pxar file level backup creation speed
> for unchanged files.
> 
> The chosen approach is to split pxar archives on creation via the
> proxmox-backup-client into two separate data and upload streams,
> one exclusive for regular file payloads, the other one for the rest
> of the pxar archive, which is mostly metadata.
> 
> On consecutive runs, the metadata archive of the previous backup run,
> which is limited in size and therefore rapidly accessed is used to
> lookup and compare the metadata for entries to encode.
> This assumes that the connection speed to the Proxmox Backup Server is
> sufficiently fast, allowing the download and chaching of the chunks for
> that index.
> 
> Changes to regular files are detected by comparing all of the files
> metadata object, including mtime, acls, ecc. If no changes are detected,
> the previous payload index is used to lookup chunks to possibly re-use
> in the payload stream of the new archive.
> In order to reduce possible chunk fragmentation, the decision whether to
> re-use or re-encode a file payload is deferred until enough information
> is gathered by adding entries to a look-ahead cache. If the padding
> introduced by reusing chunks falls below a threshold, the entries are
> referenced, the chunks are re-used and injected into the pxar payload
> upload stream, otherwise they are discated and the files encoded
> regularly.

There's still some not-too-fundamental refactoring in the feedback this time
around, but it's alreayd taking up shape.

A few bigger open questions:
- maybe do some test runs with different non-sliding-window chunking approaches
- what to do about the catalog? with the split archives, it would be nice to
  get rid of the overhead of having two metadata archives..
- CliParams/Prelude: what to use it for, and how (other than the parameters/CLI
  excludes)
- should we add a mode to force split archive, but no-reuse (for example,
  allowing to reset padding overhead every X backups)
- more testing, also of pathologically constructed input would be great (both
  for validation, and for performance/reuse regression testing)

Also, clippy doesn't like some of the new code, maybe you could take a look at
those as well, it's mostly minor stuff like unnecessary reference taking..

Thanks for all your work on this, I am sure this will be a big step forward in
extending the use cases where PBS makes sense :)




More information about the pbs-devel mailing list