[pve-devel] HA Migration on shutdown/reboot

Thomas Lamprecht t.lamprecht at proxmox.com
Tue May 28 08:02:30 CEST 2019


Hi,

On 5/28/19 2:40 AM, Bastian Sebode wrote:
> Hello Proxmox Team,
> 
> I'm wondering why there is no option "migrate" for the shutdown_policy
> available.

Somewhat similar is planned.
https://bugzilla.proxmox.com/show_bug.cgi?id=2181

> 
> I looked a bit through the code and found create_migrate_worker() in
> PVE/API2/Nodes.pm, which is used in mass migration. Can't this be used
> in PVE/HA/LRM.pm when shutdown_policy = "migrate"?
> 
> You know that servers take from 1 to 10 minutes to reboot. So the
> interruption by freezing, shutting down and starting on another node in
> HA setup seems not logic to me and will take a long time to recover the
> service.

This _really_ depends on the setup, some even do *not* want live migration
as they have VMs with hundreds of GB memory, and for them it's much faster
to shutdown the VM and just restart it over at another node.

> -> shutdown of databases, mounts, guest os
> -> shutdown of host
> -> start on other host
> 
> Migration would keep the service running all the time - I guess in HA
> activated Environments online migration is mostly possible - and would
> also accelerate the shutdown of the host, because the VMs don't have to
> shut down on that host.
> 
> Defining the Target Node could rely on the HA Groups priority and if not
> in HA you could still freeze and shutdown a VM or migrate to least used
> node - I remember Thomas already wrote about this and the problems
> behind - or even ask where to migrate.

By changing the HA groups you even can trigger a migration of all VMs,
but for our planned "maintenance" mode this is not enough, users surely
want to either migrate, or if local disks, suspend/shutdown non-HA VMs
as well. And they bring a bit of problems with them, but should be all
solvable.

> 
> Right now the only way to keep services in HA online all the time
> through a node reboot, is to change the HA Group, so the service gets
> correctly migrated by crm. But that also has to be undone after the reboot.
> 
> Okay, right now LXC came to my mind... I know there's no online
> migration, but am speaking for KVM now. ;-) Freeze is also applicable here.
> 
> Probably it's not that easy to implement as I think on top, but do you
> already think of a feature like this? Or is there any other way to
> update my HA activated Cluster without service downtime and without the
> HA Group changing?

Yes we think about it, and it's really the next thing on my TODO after
getting our software stack up and fully ready for the upcoming Buster,
which is naturally a bit of work for us.

But your thought-out request is appreciated and I'll try to really
finally kick-start this.

cheers,
Thomas




More information about the pve-devel mailing list