[pdm-devel] [PATCH proxmox-datacenter-manager 01/16] server: add locked sdn client helpers

Wed Aug 27 15:34:52 CEST 2025

On 8/27/25 3:29 PM, Dominik Csapak wrote:
> On 8/27/25 3:22 PM, Stefan Hanreich wrote:
>> On 8/27/25 3:10 PM, Dominik Csapak wrote:
>>> [snip]
>>>>>> +
>>>>>> +        if errors {
>>>>>> +            let mut rollback_futures = FuturesUnordered::new();
>>>>>> +
>>>>>> +            for (client, ctx) in self.clients {
>>>>>> +                let ctx = Arc::new(ctx);
>>>>>> +                let err_ctx = ctx.clone();
>>>>>> +
>>>>>> +                rollback_futures.push(
>>>>>> +                    client
>>>>>> +                        .rollback_and_release()
>>>>>> +                        .map_ok(|_| ctx)
>>>>>> +                        .map_err(|err| (err, err_ctx)),
>>>>>> +                );
>>>>>> +            }
>>>>>> +
>>>>>> +            while let Some(result) = rollback_futures.next().await {
>>>>>> +                match result {
>>>>>> +                    Ok(ctx) => {
>>>>>> +                        proxmox_log::info!(
>>>>>> +                            "successfully rolled back configuration
>>>>>> for remote {}",
>>>>>> +                            ctx.remote_id()
>>>>>> +                        )
>>>>>> +                    }
>>>>>> +                    Err((_, ctx)) => {
>>>>>> +                        proxmox_log::error!(
>>>>>> +                            "could not rollback configuration for
>>>>>> remote {}",
>>>>>> +                            ctx.remote_id()
>>>>>> +                        )
>>>>>> +                    }
>>>>>> +                }
>>>>>> +            }
>>>>>
>>>>>
>>>>> if we could not rollback_and_release, should we maybe call release for
>>>>> those ? The admin has to cleanup anyway, but is it possible for one to
>>>>> release such a lock (via the gui?). Not very sure about the lock
>>>>> semantics here.
>>>>
>>>> I thought about it, but I think it is better to leave the remote locked
>>>> on failure since admins then get an error on the respective PVE remote
>>>> if they want to change some settings there.
>>>>
>>>> They can then either explicitly rollback the changes manually or fix
>>>> the
>>>> issues and apply the configuration anyway.
>>>>
>>>> The most likely scenario for why this would fail is networking issues
>>>> imo. I think if the rollback_and_release call fails, there is a decent
>>>> chance that a subsequent release call will fail as well. This would
>>>> introduce a third level of requests + error handling, which I'm not
>>>> sure
>>>> is worth the potential gain.
>>>>
>>>> It is also analogous to e.g. failures when snapshotting a VM where the
>>>> lock needs to be forcibly released iirc.
>>>>
>>>> I think the error message could be improved by indicating that the
>>>> configuration is left untouched and admins need to rollback / release
>>>> manually and potentially even a indicator / pointer on how to do it.
>>>>
>>>>
>>>>> also using FuturesUnordered here can make sense here, but it's not
>>>>> really parallel, since it only does work during await.
>>>>>
>>>>> does it make a big difference (for many remotes) instead of just
>>>>> awaiting the futures directly? or joining them ?
>>>>>
>>>>> i.e. having a large number of remotes unordered in the logs does not
>>>>> make it easier to read.
>>>>>
>>>>> I'd either expect the remotes to be sorted, or the failed remotes
>>>>> to be
>>>>> separated from the successful ones (e.g. only logging the error ones
>>>>> separately and only a number for the succesful ones?)
>>>>
>>>> I think I had some issues with just using a Vec of futures + join_all
>>>> (ownership + borrowing with clients / ctx) and using FuturesUnordered
>>>> resolved the issue. I'll check again more thoroughly on the root cause
>>>> and report back. I wanted to improve how the requests are made in the
>>>> client anyway, but didn't yet get around to it. We might also just want
>>>> to spawn blocking tasks in tokio instead.
>>>>
>>>> Using FuturesOrdered would be fine if collecting into a Vec doesn't
>>>> work, but then we should maybe sort the clients in the constructor
>>>> (alphabetically by remote name?) to ensure deterministic/consistent
>>>> ordering any time any function in this collection is called.
>>>>
>>>> Anyway, I'll look into improving this in the next version (most likely
>>>> by spawning blocking tasks in tokio itself)!
>>>>
>>>>
>>> i don't think the tasks have to be blocking (since the code is async
>>> anyway?)
>>
>> yeah, you're right.
>>
>>> but you could do something similar to lukas with e.g.
>>> joinset + a semaphore to make (limited) parallel requests
>>>
>>> if we have a clearer picture of the things we want/need
>>> from such a thing in the future, we could abstract that
>>
>> I'll talk to Lukas and check out his implementation. We talked shortly
>> about it last week, but I only took a cursory glance of his
>> implementation.
>>
>> We might be able to convert this to wrap Lukas' implementation at some
>> point in the future. Potentially we could do this in a follow-up as well
>> to avoid doing the same work twice / duplicate the logic, since imo for
>> starting out the current implementation is good enough - what do you
>> think?
>>
> 
> sounds good to me. We just should refactor it rather sooner than later,
> otherwise this will very likely be forgotten ;)

I'll keep it in mind and add a TODO as well. Will switch to Ordered +
add sorting in the constructor for the new version, so we have the
ordered output at least!