[pve-devel] [PATCH v2 qemu-server] fix #4501: TCP migration: start vm: move port reservation and usage closer together
Hannes Dürr
h.duerr at proxmox.com
Wed Dec 27 18:07:46 CET 2023
I live-migrated 300 vms with:
migration: insecure
max_workers: 30
and 10 parallel workers
(as described here
https://forum.proxmox.com/threads/live-migration.127355/#post-557181)
Had zero issues with the patch applied,
without the patch i had ~30 errors
Tested-by: Hannes Duerr <h.duerr at proxmox.com>
On 12/20/23 13:32, Thomas Lamprecht wrote:
> On 19/12/2023 14:44, Fiona Ebner wrote:
>> Currently, volume activation, PCI reservation and resetting systemd
>> scope happen in between, so the 5 second expiretime used for port
>> reservation is not always enough.
>>
>> It's possible to defer telling QEMU where it should listen for
>> migration and do so after it has been started via QMP. Therefore, the
>> port reservation can be moved very close to the actual usage.
>>
>> Mentioned here for completeness and can still be done as an additional
>> change later if desired: next_migrate_port could be modified to
>> optionally return the open socket and it should be possible to pass
>> the file descriptor directly to QEMU, but that would require accepting
>> the connection before on the Perl side (otherwise leads to ENOTCONN
>> 107). While it would avoid any races, it's not the most elegant
>> and the change at hand should be enough in all practical situations.
>>
>> Signed-off-by: Fiona Ebner <f.ebner at proxmox.com>
>> ---
>>
>> Discussion for v1:
>> https://lists.proxmox.com/pipermail/pve-devel/2023-November/060149.html
>>
>> Changes in v2:
>> * move reservation+usage much closer together than was done in v1
>> of the qemu-server patch
>> * drop other partial fix attempts for pve-common
> I find this approach more than just an OK'ish stop-gap, this should
> fix most such issues we can have in practice.
>
> If you can get someone to additionally test this it's fine to apply as
> is IMO.
>
> The one thing that might be worse (didn't check reservation logic)
> compared to FD passing is when there would be no migration ports
> available, as then we would have already spend slightly more time and
> resources by having the VM already started. We could side-step this a
> bit by looping for requesting a reserved port for a few seconds.
>
> But IMO it's not highly likely to run out of such ports, most actions
> that can spawn multiple migrations (like HA) are capped by default.
>
> So once tested a few general migration situations, consider this:
>
> Acked-by: Thomas Lamprecht <t.lamprecht at proxmox.com>
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
>
More information about the pve-devel
mailing list