[pbs-devel] [PATCH v3 proxmox-backup 40/58] client: chunk stream: add dynamic entries injection queues

Fabian Grünbichler f.gruenbichler at proxmox.com
Tue Apr 9 09:19:40 CEST 2024


On April 8, 2024 3:54 pm, Christian Ebner wrote:
> On 4/4/24 16:52, Fabian Grünbichler wrote:
>> once more I am wondering here whether for the payload stream, a vastly
>> simplified chunker that just picks the boundaries based on re-use and
>> payload size(s) (to avoid the one file == one chunk pathological case
>> for lots of small files) wouldn't improve performance :)
> 
> Do you suggest to have 2 chunker implementations and for the payload 
> stream, instead of performing chunking by the statistical sliding window 
> approach use the  provide the chunk boundaries by some interface rather 
> than performing the chunking based on the statistical approach with the 
> sliding window? As you mentioned in response to Dietmar on patch 49 of 
> this patch series version?

yes - I think it would be interesting to evaluate. but only if such an
experiment is not a week-long effort :)

the two main questions would be:
- is a metadata-informed chunker faster than the sliding window (or how
  much faster)
- how does the dedup rate compare for some common scenarios

so maybe it would make sense to have a "change based" test corpus
first (which we IMHO want anyway), and then compare the two.




More information about the pbs-devel mailing list