[pbs-devel] [PATCH v4 proxmox-backup 4/7] fix #4182: server: sync: allow pulling groups concurrently

Mon Apr 7 09:21:04 CEST 2025

On April 4, 2025 8:02 pm, Max Carrara wrote:
> On Fri Apr 4, 2025 at 3:49 PM CEST, Christian Ebner wrote:
>> Currently, a sync job sequentially pulls the backup groups and the
>> snapshots contained within them, therefore being limited in download
>> speed by the http2 connection of the source reader instance in case
>> of remote syncs. High latency networks suffer from limited download
>> speed.
>>
>> Improve the throughput by allowing to pull up to a configured number
>> of backup groups concurrently, by creating tasks connecting and
>> pulling from the remote source in parallel.
>>
>> Make the error handling and accounting logic for each group pull
>> reusable by moving it into its own helper function, returning the
>> future.
>>
>> The store progress is placed behind an atomic reference counted mutex
>> to allow for concurrent access of status updates.
> 
> Yeah, so... I've got some thoughts about this:
> 
> First of all, I think that that's *fine* here, as the
> `Arc<Mutex<StoreProgress>>` probably isn't going to face that much lock
> contention or something anyway. So to get that out of the way, IMO we
> can keep that here as it is right now.
> 
> But in the future I do think that we should check the locations where we
> have that kind of concurrent data access / modification, because I feel
> the amount of mutexes is only going to continue growing. That's not a
> bad thing per se, but it does come with a few risks / drawbacks (e.g.
> higher risk for deadlocks).
> 
> Without going to deep into the whole "how to avoid deadlocks" discussion
> and other things, here's an alternative I want to propose that could
> perhaps be done in (or as part of a) different series, since it's a bit
> out of scope for this one here. (Though, if you do wanna do it in this
> one, I certainly won't complain!)
> 
> First, since `StoreProgress` only contains four `usize`s, it should be
> fairly easy to convert the ones being modified into `AtomicUsize`s and
> perhaps add helper methods to increase their respective values;
> something like this:
> 
>     #[derive(Debug, Default)]
>     /// Tracker for progress of operations iterating over `Datastore` contents.
>     pub struct StoreProgress {
>         /// Completed groups
>         pub done_groups: AtomicUsize,
>         /// Total groups
>         pub total_groups: u64,
>         /// Completed snapshots within current group
>         pub done_snapshots: AtomicUsize,
>         /// Total snapshots in current group
>         pub group_snapshots: u64,
>     }
>     
>     // [...]
>     
>     impl StoreProgress {
>         pub fn add_done_group(&self) {
>             let _ = self.done_groups.fetch_add(1, Ordering::Relaxed);
>         }
>     
>         // [...]
>     }
> 
> (of course, what it all should look like in detail is up to bikeshedding :P)
> 
> Something like that would probably be nicer here, because:
> 
> - You won't need to wrap `StoreProgress` within an `Arc<Mutex<T>>`
>   anymore -- a shared reference is enough, since ...
> - Operations on atomics take &self (that's the whole point of them ofc ;p )
> 
> This means that:
> - Cloning an `Arc<T>` is not necessary anymore
>   --> should be approx. two atomic ops less times the amount of `Arc`s used
>       (on create/clone and drop for each `Arc`)
> - Locking the `Mutex<T>` is also not necessary anymore, which means
>   --> should be two atomic ops less for each call to `.lock()`
>       (acquire and release)
> 
> In turn, this is replaced by a single atomic call with
> `Ordering::Relaxed` (which is fine for counters [0]). So, something like
> 
>     progress.lock().unwrap().done_groups += 1;
> 
> would just become
> 
>     progress.add_done_group();
> 
> which is also quite neat.
> 
> Note however that we might have to split that struct into a "local" and
> "shared" version or whatever in order to adapt it all to the current
> code (a locally-used struct ofc doesn't need atomics).
> 
> Again, I think what you're doing here is perfectly fine; I just think
> that we should have a look at all of those concurrent data accesses and
> see whether we can slim some stuff down or perhaps have some kind of
> statically enforced mutex ordering for deadlock prevention [1].
> 
> [0]: https://doc.rust-lang.org/nomicon/atomics.html#relaxed
> [1]: https://www.youtube.com/watch?v=Ba7fajt4l1M

we do have a similar construct in the client (although still with Arc<>,
maybe we could eliminate it there? ;)):

https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=pbs-client/src/backup_stats.rs;h=f0563a0011b1117e32027a8a88f9d6f65db591f0;hb=HEAD#l45