[pbs-devel] applied-series: [PATCH proxmox-backup v3 0/5] improve compression throughput

Wed Aug 7 19:06:49 CEST 2024

On 05/08/2024 11:24, Dominik Csapak wrote:
> in my tests (against current master) it improved the throughput if
> the source/target storage is fast enough (tmpfs -> tmpfs):
> 
> Type                master (MiB/s)   with my patches (MiB/s)
> .img file           ~614             ~767
> pxar one big file   ~657             ~807
> pxar small files    ~576             ~627
> 
> (these results are also in the relevant commit message)
> 
> It would be great, if someone else can cross check my results here.
> Note: the the pxar code being faster than the img code seems to stem
> from better multithreading pipelining in that code or in tokio (pxar
> codepath scales more directly with more cores than the .img codepath)
> 
> changes from v2:
> * use zstd_safe instead of zstd so we have access to the underlying
>   error code
> * add test for the error code handling since that's not part of the
>   public zstd api, only an implementation detail (albeit one that's
>   not likely to change soon)
> * seperated the tests for the decode(encode()) roundtrip so a failure
>   can more easily assigned to a specific codepath
> 
> changes from v1:
> * reorder patches so that the data blob writer removal is the first one
> * add tests for DataBlob that we can decode what we encoded
>   (to see that my patches don't mess up the chunk generation)
> * add new patch to cleanup the `encode` function a bit
> 
> Dominik Csapak (5):
>   remove data blob writer
>   datastore: test DataBlob encode/decode roundtrip
>   datastore: data blob: add helper and test for checking zstd_safe error
>     code
>   datastore: data blob: increase compression throughput
>   datastore: DataBlob encode: simplify code
> 
>  Cargo.toml                            |   1 +
>  pbs-datastore/Cargo.toml              |   1 +
>  pbs-datastore/src/data_blob.rs        | 193 ++++++++++++++++-------
>  pbs-datastore/src/data_blob_writer.rs | 212 --------------------------
>  pbs-datastore/src/lib.rs              |   2 -
>  tests/blob_writer.rs                  | 105 -------------
>  6 files changed, 144 insertions(+), 370 deletions(-)
>  delete mode 100644 pbs-datastore/src/data_blob_writer.rs
>  delete mode 100644 tests/blob_writer.rs
> 

Applied, with some rewording of the commit message and some slight
adaption to the test commit.

Ps, it seems the zstd crate authors aren't so sure why they use the
32 KB buffer either, which FWICT is the underlying issue here:

https://docs.rs/zstd/latest/src/zstd/stream/zio/writer.rs.html#41-42

But it's a bit hard to follow, to me this looks less like a allocation
pattern issue (on its own), but rather than a increased overhead due to
processing in 32 KiB chunks, the extra copying itself naturally doesn't
help, but that's not a bad allocation pattern but rather a single
(FWICT) avoidable allocations for the small buffer, but as said, not
100% sure as the code is rather over-engineered... Anyhow, I tried to
add these findings, including the uncertainty they have, in the commit
message to have some better background.

I know you could have done this in a v4, but it felt faster to just
amend the changes, especially since I have a few days off and would
have to recreate the mental context anyway.