[pve-devel] applied: [PATCH ha-manager] lrm: fix getting stuck on restart

Thomas Lamprecht t.lamprecht at proxmox.com
Wed Apr 27 14:00:12 CEST 2022


On 27.04.22 12:19, Fabian Grünbichler wrote:
> run_workers is responsible for updating the state after workers have
> exited. if the current LRM state is 'active', but a shutdown_request was
> issued in 'restart' mode (like on package upgrades), this call is the
> only one made in the LRM work() loop.
> 
> skipping it if there are active services means the following sequence of
> events effectively keeps the LRM from restarting or making any progress:
> 
> - start HA migration on node A
> - reload LRM on node A while migration is still running
> 
> even once the migration is finished, the service count is still >= 1
> since the LRM never calls run_workers (directly or via
> manage_resources), so the service having been migrated is never noticed.
> 
> maintenance mode (i.e., rebooting the node with shutdown policy migrate)
> does call manage_resources and thus run_workers, and will proceed once
> the last worker has exited.
> 
> reported by a user:
> 
> https://forum.proxmox.com/threads/lrm-hangs-when-updating-while-migration-is-running.108628
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> ---
> better viewed with -w ;)
> 
>  src/PVE/HA/LRM.pm | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
>

good fix!

applied, thanks!





More information about the pve-devel mailing list