[pdm-devel] [PATCH proxmox-datacenter-manager 01/16] server: add locked sdn client helpers

Wed Aug 27 15:29:07 CEST 2025

On 8/27/25 3:22 PM, Stefan Hanreich wrote:
> On 8/27/25 3:10 PM, Dominik Csapak wrote:
>> [snip]
>>>>> +
>>>>> +        if errors {
>>>>> +            let mut rollback_futures = FuturesUnordered::new();
>>>>> +
>>>>> +            for (client, ctx) in self.clients {
>>>>> +                let ctx = Arc::new(ctx);
>>>>> +                let err_ctx = ctx.clone();
>>>>> +
>>>>> +                rollback_futures.push(
>>>>> +                    client
>>>>> +                        .rollback_and_release()
>>>>> +                        .map_ok(|_| ctx)
>>>>> +                        .map_err(|err| (err, err_ctx)),
>>>>> +                );
>>>>> +            }
>>>>> +
>>>>> +            while let Some(result) = rollback_futures.next().await {
>>>>> +                match result {
>>>>> +                    Ok(ctx) => {
>>>>> +                        proxmox_log::info!(
>>>>> +                            "successfully rolled back configuration
>>>>> for remote {}",
>>>>> +                            ctx.remote_id()
>>>>> +                        )
>>>>> +                    }
>>>>> +                    Err((_, ctx)) => {
>>>>> +                        proxmox_log::error!(
>>>>> +                            "could not rollback configuration for
>>>>> remote {}",
>>>>> +                            ctx.remote_id()
>>>>> +                        )
>>>>> +                    }
>>>>> +                }
>>>>> +            }
>>>>
>>>>
>>>> if we could not rollback_and_release, should we maybe call release for
>>>> those ? The admin has to cleanup anyway, but is it possible for one to
>>>> release such a lock (via the gui?). Not very sure about the lock
>>>> semantics here.
>>>
>>> I thought about it, but I think it is better to leave the remote locked
>>> on failure since admins then get an error on the respective PVE remote
>>> if they want to change some settings there.
>>>
>>> They can then either explicitly rollback the changes manually or fix the
>>> issues and apply the configuration anyway.
>>>
>>> The most likely scenario for why this would fail is networking issues
>>> imo. I think if the rollback_and_release call fails, there is a decent
>>> chance that a subsequent release call will fail as well. This would
>>> introduce a third level of requests + error handling, which I'm not sure
>>> is worth the potential gain.
>>>
>>> It is also analogous to e.g. failures when snapshotting a VM where the
>>> lock needs to be forcibly released iirc.
>>>
>>> I think the error message could be improved by indicating that the
>>> configuration is left untouched and admins need to rollback / release
>>> manually and potentially even a indicator / pointer on how to do it.
>>>
>>>
>>>> also using FuturesUnordered here can make sense here, but it's not
>>>> really parallel, since it only does work during await.
>>>>
>>>> does it make a big difference (for many remotes) instead of just
>>>> awaiting the futures directly? or joining them ?
>>>>
>>>> i.e. having a large number of remotes unordered in the logs does not
>>>> make it easier to read.
>>>>
>>>> I'd either expect the remotes to be sorted, or the failed remotes to be
>>>> separated from the successful ones (e.g. only logging the error ones
>>>> separately and only a number for the succesful ones?)
>>>
>>> I think I had some issues with just using a Vec of futures + join_all
>>> (ownership + borrowing with clients / ctx) and using FuturesUnordered
>>> resolved the issue. I'll check again more thoroughly on the root cause
>>> and report back. I wanted to improve how the requests are made in the
>>> client anyway, but didn't yet get around to it. We might also just want
>>> to spawn blocking tasks in tokio instead.
>>>
>>> Using FuturesOrdered would be fine if collecting into a Vec doesn't
>>> work, but then we should maybe sort the clients in the constructor
>>> (alphabetically by remote name?) to ensure deterministic/consistent
>>> ordering any time any function in this collection is called.
>>>
>>> Anyway, I'll look into improving this in the next version (most likely
>>> by spawning blocking tasks in tokio itself)!
>>>
>>>
>> i don't think the tasks have to be blocking (since the code is async
>> anyway?)
> 
> yeah, you're right.
> 
>> but you could do something similar to lukas with e.g.
>> joinset + a semaphore to make (limited) parallel requests
>>
>> if we have a clearer picture of the things we want/need
>> from such a thing in the future, we could abstract that
> 
> I'll talk to Lukas and check out his implementation. We talked shortly
> about it last week, but I only took a cursory glance of his implementation.
> 
> We might be able to convert this to wrap Lukas' implementation at some
> point in the future. Potentially we could do this in a follow-up as well
> to avoid doing the same work twice / duplicate the logic, since imo for
> starting out the current implementation is good enough - what do you think?
> 

sounds good to me. We just should refactor it rather sooner than later,
otherwise this will very likely be forgotten ;)