[pve-devel] [POC qemu-server] fix 3303: allow "live" upgrade of qemu version

Wed Jun 23 19:56:53 CEST 2021

On Thu, 2021-04-08 at 18:44 +0200, Thomas Lamprecht wrote:
> On 08.04.21 12:33, Fabian Ebner wrote:
> > The code is in a very early state, I'm just sending this to discuss
> > the idea.
> > I didn't do a whole lot of testing yet, but it does seem to work.
> > 
> > The idea is rather simple:
> > 1. save the state to ramfs
> > 2. stop the VM
> > 3. start the VM loading the state
> 
> For the record, as we (Dietmar, you and I) discussed this a bit off-
> list:
> 
> The issue we see here is that one temporarily requires a potential
> big chunk of
> free memory, i.e., another time the amount the guest is assigned. So
> tens to
> hundreds of GiB, which (educated guess) > 90 % of our users just do
> not have
> available, at least for the bigger VMs of theirs.
> 
> So, it would be nicer if we could makes this more QEMU internal,
> e.g., just save
> the state out (as that one may not be compatible 1:1 for reuse with
> the new QEMU
> version) and re-use the guest memory directly, e.g., start new QEMU
> process
> migrate state and map over the guest-memory, then pause old one, cont
> new one and
> be done (very condensed).
> That may have it's own difficulties/edge-cases, but it would not
> require having
> so much extra memory freely available...

Hi,

I'm wondering how much ksm would help reduce the extra memory
requirement during same host migration.

May be there's a sweet spot by changing ksm to be more aggressive just
before starting the migration and slowing down the migration using
bandwidth control parameter so all new pages created by the migration
process end up shared quickly? And returning ksmtuned to default after
it's done.

Or may be only lowering migration bandwidth will be enough with ksm
settings unchanged (still has to be faster than mutation rate though so
can't be too low).

I assume for most users even if the migration to same host is slow it's
fine since it will not consume network ressources, just a bit more cpu.

Sincerely,

Laurent

PS: thanks Stefan_R for pointing this thread
https://forum.proxmox.com/threads/upgrade-of-pve-qemu-kvm-and-running-v
m.91236/

> > 
> > This approach solves the problem that our stack is (currently) not
> > designed to
> > have multiple instances with the same VM ID running. To do so, we'd
> > need to
> > handle config locking, sockets, pid file, passthrough resources?,
> > etc.
> > 
> > Another nice feature of this approach is that it doesn't require
> > touching the
> > vm_start or migration code at all, avoiding further bloating.
> > 
> > 
> > Thanks to Fabian G. and Stefan for inspiring this idea:
> > 
> > Fabian G. suggested using the suspend to disk + start route if the
> > required
> > changes to our stack would turn out to be infeasable.
> > 
> > Stefan suggested migrating to a dummy VM (outside our stack) which
> > just holds
> > the state and migrating back right away. It seems that dummy VM is
> > in fact not
> > even needed ;) If we really really care about smallest possible
> > downtime, this
> > approach might still be the best, and we'd need to start the dummy
> > VM while the
> > backwards migration runs (resulting in two times the migration
> > downtime). But
> > it does have more moving parts and requires some migration/startup
> > changes.
> > 
> > 
> > Fabian Ebner (6):
> >   create vmstate_size helper
> >   create savevm_monitor helper
> >   draft of upgrade_qemu function
> >   draft of qemuupgrade API call
> >   add timing for testing
> >   add usleep parameter to savevm_monitor
> > 
> >  PVE/API2/Qemu.pm  |  60 ++++++++++++++++++++++
> >  PVE/QemuConfig.pm |  10 +---
> >  PVE/QemuServer.pm | 125 +++++++++++++++++++++++++++++++++++++++---
> > ----
> >  3 files changed, 170 insertions(+), 25 deletions(-)
> > 
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>