[pve-devel] applied: [PATCH ha-manager] lrm: fix getting stuck on restart
Thomas Lamprecht
t.lamprecht at proxmox.com
Wed Apr 27 14:00:12 CEST 2022
On 27.04.22 12:19, Fabian Grünbichler wrote:
> run_workers is responsible for updating the state after workers have
> exited. if the current LRM state is 'active', but a shutdown_request was
> issued in 'restart' mode (like on package upgrades), this call is the
> only one made in the LRM work() loop.
>
> skipping it if there are active services means the following sequence of
> events effectively keeps the LRM from restarting or making any progress:
>
> - start HA migration on node A
> - reload LRM on node A while migration is still running
>
> even once the migration is finished, the service count is still >= 1
> since the LRM never calls run_workers (directly or via
> manage_resources), so the service having been migrated is never noticed.
>
> maintenance mode (i.e., rebooting the node with shutdown policy migrate)
> does call manage_resources and thus run_workers, and will proceed once
> the last worker has exited.
>
> reported by a user:
>
> https://forum.proxmox.com/threads/lrm-hangs-when-updating-while-migration-is-running.108628
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler at proxmox.com>
> ---
> better viewed with -w ;)
>
> src/PVE/HA/LRM.pm | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
>
good fix!
applied, thanks!
More information about the pve-devel
mailing list