[pbs-devel] [PATCH proxmox 1/2] rest-server: handle failure in worker task setup correctly

Mon Dec 2 10:14:45 CET 2024

On November 29, 2024 2:34 pm, Thomas Lamprecht wrote:
> Am 29.11.24 um 14:13 schrieb Fabian Grünbichler:
>> if setting up a new worker fails after it has been inserted into the
>> WORKER_TASK_LIST, we need to clean it up instead of bubbling up the error right
>> away, else we "leak" the worker task and it never finishes..
>> 
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
>> ---
>> we probably want to optimize update_active_workers as well to reduce the lock
>> contention there that triggers this issue in the first place..
>> 
>>  proxmox-rest-server/src/worker_task.rs | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/proxmox-rest-server/src/worker_task.rs b/proxmox-rest-server/src/worker_task.rs
>> index 6e76c2ca..3ca93965 100644
>> --- a/proxmox-rest-server/src/worker_task.rs
>> +++ b/proxmox-rest-server/src/worker_task.rs
>> @@ -923,7 +923,12 @@ impl WorkerTask {
>>              set_worker_count(hash.len());
>>          }
>>  
>> -        setup.update_active_workers(Some(&upid))?;
>> +        let res = setup.update_active_workers(Some(&upid));
>> +        if res.is_err() {
>> +            // needed to undo the insertion into WORKER_TASK_LIST above
>> +            worker.log_result(&res);
>> +            res?
>> +        }
> 
> Seems OK from a quick look, need a bit more time for a proper review.
> 
> What the quick look can give though is style nits, i.e. IMO a bit unidiomatic for our
> code.
> 
> Would prefer one of:
> 
> Combined return path through matching
> 
> match setup.update_active_workers(Some(&upid)) {
>    Err(err) => {
>         // needed to undo the insertion into the active WORKER_TASK_LIST above
>         worker.log_result(&res);
>         Err(err)
>    }
>    Ok(_) => Ok((worker, logger))
> }
> 
> or similar than yours but avoid the outer variable:
> 
> if let Err(err) = setup.update_active_workers(Some(&upid)) {
>     // needed to undo the insertion into the active WORKER_TASK_LIST above
>     worker.log_result(&res);
>     return Err(err);
> }
> 
> IMO both fit slightly (!) better for how errors are commonly dealt with in rust and
> are thus a bit easier to understand correctly on reading.

neither of those work though, since both the log_result and the return
value need the Err(err), and err is not Clone.. maybe there is a way to
make it work, I didn't find one quickly last week and want to hand over
something to work with to Dominik ;) maybe I am missing some easy way
out though..

> 
>>  
>>          Ok((worker, logger))
>>      }
> 
>