[pbs-devel] [PATCH proxmox 2/2] rest-server: close race window when updating worker task count
Fabian Grünbichler
f.gruenbichler at proxmox.com
Mon Dec 2 10:14:44 CET 2024
On November 29, 2024 3:20 pm, Dominik Csapak wrote:
> On 11/29/24 14:27, Thomas Lamprecht wrote:
>> Am 29.11.24 um 14:13 schrieb Fabian Grünbichler:
>>> this mimics how the count is updated when spawning a new task - the lock scope
>>> needs to cover the count update itself, else there's a race when multiple
>>> worker's log their result at the same time..
>>>
>>> Co-developed-by: Dominik Csapak <d.csapak at proxmox.com>
>>> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
>>> ---
>>> proxmox-rest-server/src/worker_task.rs | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/proxmox-rest-server/src/worker_task.rs b/proxmox-rest-server/src/worker_task.rs
>>> index 3ca93965..018d18c0 100644
>>> --- a/proxmox-rest-server/src/worker_task.rs
>>> +++ b/proxmox-rest-server/src/worker_task.rs
>>> @@ -1023,7 +1023,8 @@ impl WorkerTask {
>>>
>>> WORKER_TASK_LIST.lock().unwrap().remove(&self.upid.task_id);
>>> let _ = self.setup.update_active_workers(None);
>>> - set_worker_count(WORKER_TASK_LIST.lock().unwrap().len());
>>> + let lock = WORKER_TASK_LIST.lock().unwrap();
>>
>> why not use this also for the remove operation above? I.e. something like:
>>
>> let locked_worker_tasks = WORKER_TASK_LIST.lock().unwrap();
>>
>> locked_worker_tasks.remove(&self.upid.task_id);
>>
>> set_worker_count(locked_worker_tasks.len())
>>
>> If there are technical reason speaking against this, which I hope not, then a
>> comment would be definitively warranted, otherwise using a single lock would
>> IMO make this a bit clearer and locking twice isn't exactly cheaper.
>
> here the reason of the split lock is that the 'self.setup.update_active_workers` internally
> can take a lock to the WORKER_TASK_LIST, so we can't hold one over that call
>
> not super sure if can reorder these, so that we reduce the count before updating
> though. From what i understand though we want to remove ourselves from the list
> of actives tasks before reducing that counter.
>
> as fabian indicated in the other patch, we should probably split up
> the 'update_active_workers' into seperate methods to
> * add one worker
> * remove one worker
> * housekeeping for leftover workers
>
> then we could design the removal in a way that does not rely on the WORKER_TASK_LIST
> in the first place thus we could remove it from the active list before removing it
> from the internal hashmap (and could take a lock around both, the list and the count)
yes to all of the above. and a comment why the lock is obtained twice
probably is a good ideal for the stop-gap fix.
>
>>
>> Looks OK besides that, but would still want to take a closer look.
>>
>>> + set_worker_count(lock.len());
>>> }
>>>
>>> /// Log a message.
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
More information about the pbs-devel
mailing list