[pbs-devel] [PATCH proxmox 2/2] rest-server: close race window when updating worker task count

Dominik Csapak d.csapak at proxmox.com
Fri Nov 29 15:20:58 CET 2024


On 11/29/24 14:27, Thomas Lamprecht wrote:
> Am 29.11.24 um 14:13 schrieb Fabian Grünbichler:
>> this mimics how the count is updated when spawning a new task - the lock scope
>> needs to cover the count update itself, else there's a race when multiple
>> worker's log their result at the same time..
>>
>> Co-developed-by: Dominik Csapak <d.csapak at proxmox.com>
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
>> ---
>>   proxmox-rest-server/src/worker_task.rs | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/proxmox-rest-server/src/worker_task.rs b/proxmox-rest-server/src/worker_task.rs
>> index 3ca93965..018d18c0 100644
>> --- a/proxmox-rest-server/src/worker_task.rs
>> +++ b/proxmox-rest-server/src/worker_task.rs
>> @@ -1023,7 +1023,8 @@ impl WorkerTask {
>>   
>>           WORKER_TASK_LIST.lock().unwrap().remove(&self.upid.task_id);
>>           let _ = self.setup.update_active_workers(None);
>> -        set_worker_count(WORKER_TASK_LIST.lock().unwrap().len());
>> +        let lock = WORKER_TASK_LIST.lock().unwrap();
> 
> why not use this also for the remove operation above? I.e. something like:
> 
> let locked_worker_tasks = WORKER_TASK_LIST.lock().unwrap();
> 
> locked_worker_tasks.remove(&self.upid.task_id);
> 
> set_worker_count(locked_worker_tasks.len())
> 
> If there are technical reason speaking against this, which I hope not, then a
> comment would be definitively warranted, otherwise using a single lock would
> IMO make this a bit clearer and locking twice isn't exactly cheaper.

here the reason of the split lock is that the 'self.setup.update_active_workers` internally
can take a lock to the WORKER_TASK_LIST, so we can't hold one over that call

not super sure if can reorder these, so that we reduce the count before updating
though. From what i understand though we want to remove ourselves from the list
of actives tasks before reducing that counter.

as fabian indicated in the other patch, we should probably split up
the 'update_active_workers' into seperate methods to
  * add one worker
  * remove one worker
  * housekeeping for leftover workers

then we could design the removal in a way that does not rely on the WORKER_TASK_LIST
in the first place thus we could remove it from the active list before removing it
from the internal hashmap (and could take a lock around both, the list and the count)

> 
> Looks OK besides that, but would still want to take a closer look.
> 
>> +        set_worker_count(lock.len());
>>       }
>>   
>>       /// Log a message.
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel





More information about the pbs-devel mailing list