[pdm-devel] [PATCH proxmox-datacenter-manager 01/16] server: add locked sdn client helpers

Wed Aug 27 15:22:13 CEST 2025

On 8/27/25 3:10 PM, Dominik Csapak wrote:
> [snip]
>>>> +
>>>> +        if errors {
>>>> +            let mut rollback_futures = FuturesUnordered::new();
>>>> +
>>>> +            for (client, ctx) in self.clients {
>>>> +                let ctx = Arc::new(ctx);
>>>> +                let err_ctx = ctx.clone();
>>>> +
>>>> +                rollback_futures.push(
>>>> +                    client
>>>> +                        .rollback_and_release()
>>>> +                        .map_ok(|_| ctx)
>>>> +                        .map_err(|err| (err, err_ctx)),
>>>> +                );
>>>> +            }
>>>> +
>>>> +            while let Some(result) = rollback_futures.next().await {
>>>> +                match result {
>>>> +                    Ok(ctx) => {
>>>> +                        proxmox_log::info!(
>>>> +                            "successfully rolled back configuration
>>>> for remote {}",
>>>> +                            ctx.remote_id()
>>>> +                        )
>>>> +                    }
>>>> +                    Err((_, ctx)) => {
>>>> +                        proxmox_log::error!(
>>>> +                            "could not rollback configuration for
>>>> remote {}",
>>>> +                            ctx.remote_id()
>>>> +                        )
>>>> +                    }
>>>> +                }
>>>> +            }
>>>
>>>
>>> if we could not rollback_and_release, should we maybe call release for
>>> those ? The admin has to cleanup anyway, but is it possible for one to
>>> release such a lock (via the gui?). Not very sure about the lock
>>> semantics here.
>>
>> I thought about it, but I think it is better to leave the remote locked
>> on failure since admins then get an error on the respective PVE remote
>> if they want to change some settings there.
>>
>> They can then either explicitly rollback the changes manually or fix the
>> issues and apply the configuration anyway.
>>
>> The most likely scenario for why this would fail is networking issues
>> imo. I think if the rollback_and_release call fails, there is a decent
>> chance that a subsequent release call will fail as well. This would
>> introduce a third level of requests + error handling, which I'm not sure
>> is worth the potential gain.
>>
>> It is also analogous to e.g. failures when snapshotting a VM where the
>> lock needs to be forcibly released iirc.
>>
>> I think the error message could be improved by indicating that the
>> configuration is left untouched and admins need to rollback / release
>> manually and potentially even a indicator / pointer on how to do it.
>>
>>
>>> also using FuturesUnordered here can make sense here, but it's not
>>> really parallel, since it only does work during await.
>>>
>>> does it make a big difference (for many remotes) instead of just
>>> awaiting the futures directly? or joining them ?
>>>
>>> i.e. having a large number of remotes unordered in the logs does not
>>> make it easier to read.
>>>
>>> I'd either expect the remotes to be sorted, or the failed remotes to be
>>> separated from the successful ones (e.g. only logging the error ones
>>> separately and only a number for the succesful ones?)
>>
>> I think I had some issues with just using a Vec of futures + join_all
>> (ownership + borrowing with clients / ctx) and using FuturesUnordered
>> resolved the issue. I'll check again more thoroughly on the root cause
>> and report back. I wanted to improve how the requests are made in the
>> client anyway, but didn't yet get around to it. We might also just want
>> to spawn blocking tasks in tokio instead.
>>
>> Using FuturesOrdered would be fine if collecting into a Vec doesn't
>> work, but then we should maybe sort the clients in the constructor
>> (alphabetically by remote name?) to ensure deterministic/consistent
>> ordering any time any function in this collection is called.
>>
>> Anyway, I'll look into improving this in the next version (most likely
>> by spawning blocking tasks in tokio itself)!
>>
>>
> i don't think the tasks have to be blocking (since the code is async
> anyway?)

yeah, you're right.

> but you could do something similar to lukas with e.g.
> joinset + a semaphore to make (limited) parallel requests
> 
> if we have a clearer picture of the things we want/need
> from such a thing in the future, we could abstract that

I'll talk to Lukas and check out his implementation. We talked shortly
about it last week, but I only took a cursory glance of his implementation.

We might be able to convert this to wrap Lukas' implementation at some
point in the future. Potentially we could do this in a follow-up as well
to avoid doing the same work twice / duplicate the logic, since imo for
starting out the current implementation is good enough - what do you think?