[pdm-devel] [PATCH proxmox-datacenter-manager 01/16] server: add locked sdn client helpers

Wed Aug 27 15:10:31 CEST 2025

[snip]
>>> +
>>> +        if errors {
>>> +            let mut rollback_futures = FuturesUnordered::new();
>>> +
>>> +            for (client, ctx) in self.clients {
>>> +                let ctx = Arc::new(ctx);
>>> +                let err_ctx = ctx.clone();
>>> +
>>> +                rollback_futures.push(
>>> +                    client
>>> +                        .rollback_and_release()
>>> +                        .map_ok(|_| ctx)
>>> +                        .map_err(|err| (err, err_ctx)),
>>> +                );
>>> +            }
>>> +
>>> +            while let Some(result) = rollback_futures.next().await {
>>> +                match result {
>>> +                    Ok(ctx) => {
>>> +                        proxmox_log::info!(
>>> +                            "successfully rolled back configuration
>>> for remote {}",
>>> +                            ctx.remote_id()
>>> +                        )
>>> +                    }
>>> +                    Err((_, ctx)) => {
>>> +                        proxmox_log::error!(
>>> +                            "could not rollback configuration for
>>> remote {}",
>>> +                            ctx.remote_id()
>>> +                        )
>>> +                    }
>>> +                }
>>> +            }
>>
>>
>> if we could not rollback_and_release, should we maybe call release for
>> those ? The admin has to cleanup anyway, but is it possible for one to
>> release such a lock (via the gui?). Not very sure about the lock
>> semantics here.
> 
> I thought about it, but I think it is better to leave the remote locked
> on failure since admins then get an error on the respective PVE remote
> if they want to change some settings there.
> 
> They can then either explicitly rollback the changes manually or fix the
> issues and apply the configuration anyway.
> 
> The most likely scenario for why this would fail is networking issues
> imo. I think if the rollback_and_release call fails, there is a decent
> chance that a subsequent release call will fail as well. This would
> introduce a third level of requests + error handling, which I'm not sure
> is worth the potential gain.
> 
> It is also analogous to e.g. failures when snapshotting a VM where the
> lock needs to be forcibly released iirc.
> 
> I think the error message could be improved by indicating that the
> configuration is left untouched and admins need to rollback / release
> manually and potentially even a indicator / pointer on how to do it.
> 
> 
>> also using FuturesUnordered here can make sense here, but it's not
>> really parallel, since it only does work during await.
>>
>> does it make a big difference (for many remotes) instead of just
>> awaiting the futures directly? or joining them ?
>>
>> i.e. having a large number of remotes unordered in the logs does not
>> make it easier to read.
>>
>> I'd either expect the remotes to be sorted, or the failed remotes to be
>> separated from the successful ones (e.g. only logging the error ones
>> separately and only a number for the succesful ones?)
> 
> I think I had some issues with just using a Vec of futures + join_all
> (ownership + borrowing with clients / ctx) and using FuturesUnordered
> resolved the issue. I'll check again more thoroughly on the root cause
> and report back. I wanted to improve how the requests are made in the
> client anyway, but didn't yet get around to it. We might also just want
> to spawn blocking tasks in tokio instead.
> 
> Using FuturesOrdered would be fine if collecting into a Vec doesn't
> work, but then we should maybe sort the clients in the constructor
> (alphabetically by remote name?) to ensure deterministic/consistent
> ordering any time any function in this collection is called.
> 
> Anyway, I'll look into improving this in the next version (most likely
> by spawning blocking tasks in tokio itself)!
> 
> 
i don't think the tasks have to be blocking (since the code is async 
anyway?)

but you could do something similar to lukas with e.g.
joinset + a semaphore to make (limited) parallel requests

if we have a clearer picture of the things we want/need
from such a thing in the future, we could abstract that