[pbs-devel] [PATCH v3 proxmox-backup 49/58] client: backup: increase average chunk size for metadata

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Apr 5 11:42:39 CEST 2024


Quoting Christian Ebner (2024-03-28 13:36:58)
> Use double the average chunk size for the metadata archive as compared
> to the payload stream. This does not only reduce the number of unique
> chunks produced by the metadata archive, not well chunkable because
> mainly many localized small changes, but further has the positive side
> effect of producing well compressable larger chunks. The reduced number
> of chunks further increases the performance for access because of
> reduced number of download requests and increased cachability.
> 
> Signed-off-by: Christian Ebner <c.ebner at proxmox.com>
> ---
> changes since version 2:
> - not present in previous version
> 
>  proxmox-backup-client/src/main.rs | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
> index 66dcaa63e..4aad0ff8c 100644
> --- a/proxmox-backup-client/src/main.rs
> +++ b/proxmox-backup-client/src/main.rs
> @@ -78,6 +78,8 @@ pub(crate) use helper::*;
>  pub mod key;
>  pub mod namespace;
>  
> +const AVG_METADATA_CHUNK_SIZE: usize = 8 * 1024 * 1024;
> +
>  fn record_repository(repo: &BackupRepository) {
>      let base = match BaseDirectories::with_prefix("proxmox-backup") {
>          Ok(v) => v,
> @@ -209,7 +211,15 @@ async fn backup_directory<P: AsRef<Path>>(
>          payload_target.is_some(),
>      )?;
>  
> -    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None);
> +    let avg_chunk_size = if payload_stream.is_none() {
> +        chunk_size
> +    } else {
> +        chunk_size
> +            .map(|size| 2 * size)

what if the user provided us with a very small chunk size? should we have a lower bound here?

I still wonder whether getting rid of the sliding window chunker wouldn't be a
net benefit for the split archive case. for the metadata stream it probably
doesn't matter much (it has a lot of churn, is small and compresses well).

for the payload stream simple accumulating 1..N files (or rather, their
contents) in a chunk until a certain size threshold is reached might perform
better (as in, both be faster than the current chunker, and give us more/better
re-usable chunks).

> +            .or_else(|| Some(AVG_METADATA_CHUNK_SIZE))
> +    };
> +
> +    let mut chunk_stream = ChunkStream::new(pxar_stream, avg_chunk_size, None);
>      let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
>  
>      let stream = ReceiverStream::new(rx).map_err(Error::from);
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
>




More information about the pbs-devel mailing list