[pve-devel] [RFC container 2/4] fix #4474: lxc api: add overrule-shutdown parameter to stop endpoint

Friedrich Weber f.weber at proxmox.com
Fri Dec 1 10:57:14 CET 2023


Thanks for looking into this!

On 17/11/2023 14:09, Wolfgang Bumiller wrote:
[...]
>>  	    return PVE::LXC::Config->lock_config($vmid, $lockcmd);
> 
> ^ Here we lock first, then fork the worker, then do `vm_stop` with the
> config lock inherited.
> 
> This means that creating multiple shutdown tasks before using one with
> override=true could cause the override task to cancel the *first* ongoing
> shutdown task, then move on to the `lock_config` call - in the meantime
> a second shutdown task acquires this very lock and performs another
> long-running shutdown, causing the `override` parameter to be
> ineffective.

Just to make sure I understand correctly, the scenario is (please
correct me if I'm wrong):

* shutdown task #1 has the lock and starts long-running shutdown
* stop API handler with override kills shutdown task #1, but does not
acquire the lock yet
* shutdown task #2 starts, acquires the lock and starts long-running
shutdown
* stop task waits for the lock => override flag was ineffective

> We should switch the ordering here: first fork the worker, then lock.
> (¹ And your new chunk would go into the worker as well)
> 
> Unless I'm missing something, but AFAICT the current ordering there is
> rather ... bad :-)

Would this actually prevent the scenario above? We cannot put my new
chunk into the locked section (because then it couldn't kill an active
shutdown task that has the lock), but if we put it into the worker
before the locked section, couldn't the same thing as above happen?
Meaning the stop task with override kills shutdown tasks but doesn't
have the lock yet, a new shutdown task acquires the lock, makes the stop
task wait for it, and renders the override flag ineffective just the same?




More information about the pve-devel mailing list