[pbs-devel] [PATCH v3 proxmox-backup 40/58] client: chunk stream: add dynamic entries injection queues
Fabian Grünbichler
f.gruenbichler at proxmox.com
Tue Apr 9 09:19:40 CEST 2024
On April 8, 2024 3:54 pm, Christian Ebner wrote:
> On 4/4/24 16:52, Fabian Grünbichler wrote:
>> once more I am wondering here whether for the payload stream, a vastly
>> simplified chunker that just picks the boundaries based on re-use and
>> payload size(s) (to avoid the one file == one chunk pathological case
>> for lots of small files) wouldn't improve performance :)
>
> Do you suggest to have 2 chunker implementations and for the payload
> stream, instead of performing chunking by the statistical sliding window
> approach use the provide the chunk boundaries by some interface rather
> than performing the chunking based on the statistical approach with the
> sliding window? As you mentioned in response to Dietmar on patch 49 of
> this patch series version?
yes - I think it would be interesting to evaluate. but only if such an
experiment is not a week-long effort :)
the two main questions would be:
- is a metadata-informed chunker faster than the sliding window (or how
much faster)
- how does the dedup rate compare for some common scenarios
so maybe it would make sense to have a "change based" test corpus
first (which we IMHO want anyway), and then compare the two.
More information about the pbs-devel
mailing list