[pbs-devel] [PATCH v3 proxmox-backup 15/33] fix #3044: server: implement push support for sync operations

Fabian Grünbichler f.gruenbichler at proxmox.com
Mon Oct 14 11:41:06 CEST 2024


> Christian Ebner <c.ebner at proxmox.com> hat am 14.10.2024 11:32 CEST geschrieben:
> On 10/10/24 16:48, Fabian Grünbichler wrote:
> >> +// Read fixed or dynamic index and push to target by uploading via the backup writer instance
> >> +//
> >> +// For fixed indexes, the size must be provided as given by the index reader.
> >> +#[allow(clippy::too_many_arguments)]
> >> +async fn push_index<'a>(
> >> +    filename: &'a str,
> >> +    index: impl IndexFile + Send + 'static,
> >> +    chunk_reader: Arc<dyn AsyncReadChunk>,
> >> +    backup_writer: &BackupWriter,
> >> +    manifest: &mut BackupManifest,
> >> +    crypt_mode: CryptMode,
> >> +    size: Option<u64>,
> >> +    known_chunks: &Arc<Mutex<HashSet<[u8; 32]>>>,
> >> +) -> Result<SyncStats, Error> {
> >> +    let (upload_channel_tx, upload_channel_rx) = mpsc::channel(20);
> >> +    let mut chunk_infos =
> >> +        stream::iter(0..index.index_count()).map(move |pos| index.chunk_info(pos).unwrap());
> > 
> > so this iterates over all the chunks in the index..
> > 
> >> +
> >> +    tokio::spawn(async move {
> >> +        while let Some(chunk_info) = chunk_infos.next().await {
> >> +            let chunk_info = chunk_reader
> >> +                .read_raw_chunk(&chunk_info.digest)
> > 
> > and this reads them
> > 
> >> +                .await
> >> +                .map(|chunk| ChunkInfo {
> >> +                    chunk,
> >> +                    digest: chunk_info.digest,
> >> +                    chunk_len: chunk_info.size(),
> >> +                    offset: chunk_info.range.start,
> >> +                });
> >> +            let _ = upload_channel_tx.send(chunk_info).await;
> > 
> > and sends them further along to the upload code.. which will then (in
> > many cases) throw away all that data we just read because it's already
> > on the target and we know that because of the previous manifest..
> > 
> > wouldn't it be better to deduplicate here already, and instead of
> > reading known chunks over and over again, just tell the server to
> > re-register them? or am I missing something here? :)
> 
> Good catch, this is indeed a possible huge performance bottleneck!
> 
> Did fix this by moving the known chunks check here (as suggested) and 
> stream a `MergedChunkInfo` instead of `ChunkInfo`, which allows to only 
> send the chunk's digest and size over to the backup writer. By this 
> known chunk are never read.

this would be one of the few cases btw where re-uploading a chunk that is missing on the server side (for whatever reason) would be possible - not sure how easy it would be to integrate though ;)




More information about the pbs-devel mailing list