[pbs-devel] [PATCH v3 proxmox-backup 15/33] fix #3044: server: implement push support for sync operations

Mon Oct 14 12:15:45 CEST 2024

On 10/14/24 12:01, Fabian Grünbichler wrote:
> 
>> Christian Ebner <c.ebner at proxmox.com> hat am 14.10.2024 11:53 CEST geschrieben:
>>
>>   
>> On 10/14/24 11:41, Fabian Grünbichler wrote:
>>>
>>>> Christian Ebner <c.ebner at proxmox.com> hat am 14.10.2024 11:32 CEST geschrieben:
>>>> On 10/10/24 16:48, Fabian Grünbichler wrote:
>>>>>> +// Read fixed or dynamic index and push to target by uploading via the backup writer instance
>>>>>> +//
>>>>>> +// For fixed indexes, the size must be provided as given by the index reader.
>>>>>> +#[allow(clippy::too_many_arguments)]
>>>>>> +async fn push_index<'a>(
>>>>>> +    filename: &'a str,
>>>>>> +    index: impl IndexFile + Send + 'static,
>>>>>> +    chunk_reader: Arc<dyn AsyncReadChunk>,
>>>>>> +    backup_writer: &BackupWriter,
>>>>>> +    manifest: &mut BackupManifest,
>>>>>> +    crypt_mode: CryptMode,
>>>>>> +    size: Option<u64>,
>>>>>> +    known_chunks: &Arc<Mutex<HashSet<[u8; 32]>>>,
>>>>>> +) -> Result<SyncStats, Error> {
>>>>>> +    let (upload_channel_tx, upload_channel_rx) = mpsc::channel(20);
>>>>>> +    let mut chunk_infos =
>>>>>> +        stream::iter(0..index.index_count()).map(move |pos| index.chunk_info(pos).unwrap());
>>>>>
>>>>> so this iterates over all the chunks in the index..
>>>>>
>>>>>> +
>>>>>> +    tokio::spawn(async move {
>>>>>> +        while let Some(chunk_info) = chunk_infos.next().await {
>>>>>> +            let chunk_info = chunk_reader
>>>>>> +                .read_raw_chunk(&chunk_info.digest)
>>>>>
>>>>> and this reads them
>>>>>
>>>>>> +                .await
>>>>>> +                .map(|chunk| ChunkInfo {
>>>>>> +                    chunk,
>>>>>> +                    digest: chunk_info.digest,
>>>>>> +                    chunk_len: chunk_info.size(),
>>>>>> +                    offset: chunk_info.range.start,
>>>>>> +                });
>>>>>> +            let _ = upload_channel_tx.send(chunk_info).await;
>>>>>
>>>>> and sends them further along to the upload code.. which will then (in
>>>>> many cases) throw away all that data we just read because it's already
>>>>> on the target and we know that because of the previous manifest..
>>>>>
>>>>> wouldn't it be better to deduplicate here already, and instead of
>>>>> reading known chunks over and over again, just tell the server to
>>>>> re-register them? or am I missing something here? :)
>>>>
>>>> Good catch, this is indeed a possible huge performance bottleneck!
>>>>
>>>> Did fix this by moving the known chunks check here (as suggested) and
>>>> stream a `MergedChunkInfo` instead of `ChunkInfo`, which allows to only
>>>> send the chunk's digest and size over to the backup writer. By this
>>>> known chunk are never read.
>>>
>>> this would be one of the few cases btw where re-uploading a chunk that is missing on the server side (for whatever reason) would be possible - not sure how easy it would be to integrate though ;)
>>
>> Well, unfortunately not directly. Here the chunk is only send via the
>> channel to the backup writer, which itself further processes the stream
>> and transforms it into futures for the upload requests. So the actual
>> failure will be there...
>>
>> In order to upload corrupt/missing chunks I think a different,
>> completely decoupled approach might be better (at least for the sync).
>>
>> E.g. the backup writer could get access to a chunk buffer, which keeps
>> the chunks in memory until the upload was successful. This could also
>> include possible lookup mechanisms to the local/remote chunk store or
>> re-generate required chunks by reading from the block device in case of
>> dirty bitmap tracking..
>>
>> The re-upload in case of reused payload chunks for `ppxar` archives is
>> more tricky...
> 
> I mostly meant in the sense of - the source chunk doesn't disappear while we are uploading (as opposed to a regular backup, where the input data might have changed in the meantime), so we can go back and retrieve it without the need to always read it and keep it in memory. it would definitely require propagating the information which chunks need to be uploaded despite being "known" back to the client in some fashion.

Yeah, I think ideally there would be a generic interface which allows 
the client to re-generate a missing/corrupt chunk as reported by the 
server (on known chunk upload).

The details of how to re-generate or buffer such chunks should than be 
handled differently based on what the chunk source is (backup of 
streams, backup of filesystems, backup of ...., buffered chunks, syncs, 
...).
If not possible, the backup should simply fail...

Something like a

```
trait {
     fn regenerate_known_chunk(digest: [32; u8]) -> Result<ChunkInfo, 
Error>;
}
```

which can be implemented as required?

But definitely out of scope for this patch series :)