[pbs-devel] [PATCH v3 proxmox-backup 15/33] fix #3044: server: implement push support for sync operations
Fabian Grünbichler
f.gruenbichler at proxmox.com
Mon Oct 14 11:41:06 CEST 2024
> Christian Ebner <c.ebner at proxmox.com> hat am 14.10.2024 11:32 CEST geschrieben:
> On 10/10/24 16:48, Fabian Grünbichler wrote:
> >> +// Read fixed or dynamic index and push to target by uploading via the backup writer instance
> >> +//
> >> +// For fixed indexes, the size must be provided as given by the index reader.
> >> +#[allow(clippy::too_many_arguments)]
> >> +async fn push_index<'a>(
> >> + filename: &'a str,
> >> + index: impl IndexFile + Send + 'static,
> >> + chunk_reader: Arc<dyn AsyncReadChunk>,
> >> + backup_writer: &BackupWriter,
> >> + manifest: &mut BackupManifest,
> >> + crypt_mode: CryptMode,
> >> + size: Option<u64>,
> >> + known_chunks: &Arc<Mutex<HashSet<[u8; 32]>>>,
> >> +) -> Result<SyncStats, Error> {
> >> + let (upload_channel_tx, upload_channel_rx) = mpsc::channel(20);
> >> + let mut chunk_infos =
> >> + stream::iter(0..index.index_count()).map(move |pos| index.chunk_info(pos).unwrap());
> >
> > so this iterates over all the chunks in the index..
> >
> >> +
> >> + tokio::spawn(async move {
> >> + while let Some(chunk_info) = chunk_infos.next().await {
> >> + let chunk_info = chunk_reader
> >> + .read_raw_chunk(&chunk_info.digest)
> >
> > and this reads them
> >
> >> + .await
> >> + .map(|chunk| ChunkInfo {
> >> + chunk,
> >> + digest: chunk_info.digest,
> >> + chunk_len: chunk_info.size(),
> >> + offset: chunk_info.range.start,
> >> + });
> >> + let _ = upload_channel_tx.send(chunk_info).await;
> >
> > and sends them further along to the upload code.. which will then (in
> > many cases) throw away all that data we just read because it's already
> > on the target and we know that because of the previous manifest..
> >
> > wouldn't it be better to deduplicate here already, and instead of
> > reading known chunks over and over again, just tell the server to
> > re-register them? or am I missing something here? :)
>
> Good catch, this is indeed a possible huge performance bottleneck!
>
> Did fix this by moving the known chunks check here (as suggested) and
> stream a `MergedChunkInfo` instead of `ChunkInfo`, which allows to only
> send the chunk's digest and size over to the backup writer. By this
> known chunk are never read.
this would be one of the few cases btw where re-uploading a chunk that is missing on the server side (for whatever reason) would be possible - not sure how easy it would be to integrate though ;)
More information about the pbs-devel
mailing list