[PVE-User] Automatic migration before reboot / shutdown? Migration to host in same group?

Thu Jul 6 14:15:00 CEST 2017

Hi,

On 07/06/2017 11:32 AM, Uwe Sauter wrote:
> Hi all,
>
> 1) I was wondering how a PVE (4.4) cluster will behave when one of the nodes is restarted / shutdown either via WebGUI or via
> commandline. Will hosted, HA-managed VMs be migrated to other hosts before shutting down or will they be stopped (and restared on
> another host once HA recognizes them as gone)?

First: on any graceful shutdown, which triggers stopping the pve-ha-lrm 
service,
all HA managed services will be queued to stop (graceful shutdown with 
timeout).
This is done to ensure consistency.

If a HA service gets then recovered to another node, or "waits" until 
the current
node comes up again depends if you triggered a shutdown or a reboot.
On a shutdown the service will be recovered after the node is seen as 
"dead" (~2 minutes)
but on a reboot we mark the service as freezed, so the ha stack does not 
touches it.
The idea here is that if a user reboots the node without migrating away 
a service he expects
that the node comes up again fast and starts the service on its own again.
Now, we know that this may not always be ideal, especially on really big 
machines
with hundreds of gigabyte of RAM and a slow as hell firmware, where a 
boot may need > 10 minutes.

An idea is to allow the configuration of the behavior and add two 
additional behaviors,
i.e. migrate away and relocate away.

> 2) Currently I run a cluster of four nodes that share the same 2U chassis:
>
> +-----+-----+
> |  A  |  B  |
> +-----+-----+
> |  C  |  D  |
> +-----+-----+
>
> (Please don't comment on whether this setup is ideal – I'm aware of the risks a single chassis brings…)
As long as nodes share continents your never save anyway :-P
> I created several HA groups:
>
> - left  contains A & C
> - right contains B & D
> - upper contains A & B
> - lower contains C & D
> - all   contains all nodes
>
> and configured VMs to run inside one of the groups.
>
> For updates I usually follow the following steps:
> - migrate VMs from node via "bulk migrate" feature, selecting one of the other nodes
> - when no more VMs run, do a "apt-get dist-upgrade" and reboot
> - repeat till all nodes are up-to-date
>
> One issue I ran into with this procedure is that sometimes while a VM is still migrated to another host, already migrated VMs are
> migrated back onto the current node because the target that was selected for "bulk migrate" was not inside the same group as the
> current host.
This is expected, you told the ha-manager that a service should or can 
not run there,
thus it tried to bring it in an "OK" state again.
> Practical example:
> - VM 101 is configured to run on the left side of the cluster
> - VM 102 is configured to run on the lower level of the cluster
> - node C shall be updated
> - I select "bulk migrate" to node D
> - VM 101 is migrated to D
> - VM 102 is migrated to D, but takes some time (a lot of RAM)
> - HA recognizes that VM 101 is not running in the correct group and schedules a migration back to node C
> - migration of VM 102 finishes and migration of VM 101 back to node C immediatelly starts
> - once migration of VM 101 has finished I manually need to initate another migration (and after that need to be faster then HA to
> do a reboot)
>
>
> Would it be possible to implement another "bulk action" that will evacuate a host in a way that for every VM, the appropriate
> target node is selected, depending on HA group configuration? This might also temporarily disable that node in HA management for
> e.g. 10min or until next reboot so that maintenance work can be done…
> What do you think of that idea?
>

Quasi, a maintenance mode? I'm not opposed to it, but if such a thing 
would be done
it would be only a light wrapper around already existing functionality.

Can I ask if whats the reason for your group setup?
I assume that all VMs may run on all nodes, but you want to "pin" some 
VMs to specific nodes for load reasons?

If this is the case I'd suggest changing the group configuration.
I.e. each node gets a group, A, B, C and D. Each group has the 
respective node with priority 2 and all others with priority 1.
When doing an system upgrade on node A you would edit group A and set 
node A's priority to 0,
now all should migrate away from this node, trying to balance the 
service count over all nodes.
You do not need to trigger a bulk action, at least for the HA managed VMs.

After all migrated execute the upgrade and reboot.
Then reconfigure the Group A that node A has again the highest priority,
i.e. 2, and the respective services migrate back to it again.

This should be quite fast to do after the initial setup, you just need 
to open the group configuration
dialog and lower/higher the priority of one node.

You could also use a simmilar procedure on your current group configuration.
The main thing what changes is that you need to edit two groups to make 
a node free.
The advantage of mine method would be that the services get distributed 
on all other nodes not just moved to a single one.

If anything is unclear or cannot apply to your situation, feel free to ask.

cheers,
Thomas

PS: if not already read, please see also:
<https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#ha_manager_groups>