[pve-devel] [PATCH v2 qemu-server] fix #4501: TCP migration: start vm: move port reservation and usage closer together

Thomas Lamprecht t.lamprecht at proxmox.com
Wed Dec 20 13:32:14 CET 2023

On 19/12/2023 14:44, Fiona Ebner wrote:
> Currently, volume activation, PCI reservation and resetting systemd
> scope happen in between, so the 5 second expiretime used for port
> reservation is not always enough.
> It's possible to defer telling QEMU where it should listen for
> migration and do so after it has been started via QMP. Therefore, the
> port reservation can be moved very close to the actual usage.
> Mentioned here for completeness and can still be done as an additional
> change later if desired: next_migrate_port could be modified to
> optionally return the open socket and it should be possible to pass
> the file descriptor directly to QEMU, but that would require accepting
> the connection before on the Perl side (otherwise leads to ENOTCONN
> 107). While it would avoid any races, it's not the most elegant
> and the change at hand should be enough in all practical situations.
> Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
> ---
> Discussion for v1:
> https://lists.proxmox.com/pipermail/pve-devel/2023-November/060149.html
> Changes in v2:
>     * move reservation+usage much closer together than was done in v1
>       of the qemu-server patch
>     * drop other partial fix attempts for pve-common

I find this approach more than just an OK'ish stop-gap, this should
fix most such issues we can have in practice.

If you can get someone to additionally test this it's fine to apply as
is IMO.

The one thing that might be worse (didn't check reservation logic)
compared to FD passing is when there would be no migration ports
available, as then we would have already spend slightly more time and
resources by having the VM already started. We could side-step this a
bit by looping for requesting a reserved port for a few seconds.

But IMO it's not highly likely to run out of such ports, most actions
that can spawn multiple migrations (like HA) are capped by default.

So once tested a few general migration situations, consider this:

Acked-by: Thomas Lamprecht <t.lamprecht at proxmox.com>

More information about the pve-devel mailing list