[pve-devel] [PATCH v2 qemu-server++ 0/15] remote migration

Fabian Ebner f.ebner at proxmox.com
Tue Nov 30 15:06:14 CET 2021


Am 11.11.21 um 15:07 schrieb Fabian Grünbichler:
> this series adds remote migration for VMs.
> 
> both live and offline migration including NBD and storage-migrated disks
> should work.
> 

Played around with it for a while. Biggest issue is that migration fails 
if there is no 'meta' property in the config. Most other things I wish 
for are better error handling, but it seems to be in good shape otherwise!


Error "storage does not exist" if the real issue is missing access 
rights. But that error also appears if missing access for 
/cluster/resources or if the target node does not exists.


For the 'config' command, 'Sys.Modify' seems to be required
     failed to handle 'config' command - 403 Permission check failed (/, 
Sys.Modify)
but it does create an empty configuration file, leading to
     target_vmid: Guest with ID '5678' already exists on remote cluster
on the next attempt.
It also already allocates the disks, but doesn't clean them up, because 
it gets the wrong lock (since the config is empty) and aborts the 'quit' 
command.


If the config is not recent enough to have a 'meta' property:
     failed to handle 'config' command - unable to parse value of 'meta' 
- got undefined value
Same issue with disk+config cleanup as above.


The local VM stayes locked with 'migrate'. Is that how it should be?
Also the __migration__ snapshot will stay around, resulting in an error 
when trying to migrate again.


For live migration I always got a (cosmetic?) "WS closed 
unexpectedly"-error:
tunnel: -> sending command "quit" to remote
tunnel: <- got reply
tunnel: Tunnel to 
https://192.168.20.142:8006/api2/json/nodes/rob2/qemu/5678/mtunnelwebsocket?
ticket=PVETUNNEL%3A<SNIP>&socket=%2Frun%2Fqemu-server%2F5678.mtunnel 
failed - WS closed unexpectedly
2021-11-30 13:49:39 migration finished successfully (duration 00:01:02)
UPID:pve701:0000D8AD:000CB782:61A61DA5:qmigrate:111:root at pam:


Fun fact: the identity storage mapping will be used for storages that 
don't appear in the explicit mapping. E.g. it's possible to migrate a VM 
that only has disks on storeA with --target-storage storeB:storeB (if 
storeA exists on the target of course). But the explicit identity 
mapping is prohibited.


When a target bridge is not present (should that be detected ahead of 
starting the migration?) and likely for any other startup failure the 
only error in the log is:
2021-11-30 14:43:10 ERROR: online migrate failure - error - tunnel 
command '{"cmd":"star<SNIP>
failed to handle 'start' command - start failed: QEMU exited with code 1
For non-remote migration we are more verbose in this case and log the 
QEMU output.


Can/should an interrupt be handled more gracefully, so that remote 
cleanup still happens?
^CCMD websocket tunnel died: command 'proxmox-websocket-tunnel' failed: 
interrupted by signal

2021-11-30 14:39:07 ERROR: interrupted by signal
2021-11-30 14:39:07 aborting phase 1 - cleanup resources
2021-11-30 14:39:08 ERROR: writing to tunnel failed: broken pipe
2021-11-30 14:39:08 ERROR: migration aborted (duration 00:00:10): 
interrupted by signal


> besides lots of rebases, implemented todos and fixed issues the main
> difference to the previous RFC is that we no longer define remote
> entries in a config file, but just expect the caller/client to give us
> all the required information to connect to the remote cluster.
> 
> new in v2: dropped parts already applied, incorporated Fabian's and
> Dominik's feedback (thanks!)
> 
> overview over affected repos and changes, see individual patches for
> more details.
> 
> proxmox-websocket-tunnel:
> 
> new tunnel helper tool for forwarding commands and data over websocket
> connections, required by qemu-server on source side
> 
> pve-access-control:
> 
> new ticket type, required by qemu-server on target side
> 
> pve-guest-common:
> 
> handle remote migration (no SSH) in AbstractMigrate,
> required by qemu-server
> 
> pve-storage:
> 
> extend 'pvesm import' to allow import from UNIX socket, required on
> target node by qemu-server
> 
> qemu-server:
> 
> some refactoring, new mtunnel endpoints, new remote_migration endpoints
> TODO: handle pending changes and snapshots
> TODO: proper CLI for remote migration
> potential TODO: precond endpoint?
> 
> pve-http-server:
> 
> fix for handling unflushed proxy streams
> 
> as usual, some of the patches are best viewed with '-w', especially in
> qemu-server..
> 
> required dependencies are noted, qemu-server also requires a build-dep
> on patched pve-common since the required options/formats would be
> missing otherwise..
> proxmox-websocket-tunnel
> 
> Fabian Grünbichler (4):
>    initial commit
>    add tunnel implementation
>    add fingerprint validation
>    add packaging
> 
> pve-access-control
> 
> Fabian Grünbichler (2):
>    tickets: add tunnel ticket
>    ticket: normalize path for verification
> 
>   src/PVE/AccessControl.pm | 52 ++++++++++++++++++++++++++++++----------
>   1 file changed, 40 insertions(+), 12 deletions(-)
> 
> pve-http-server
> 
> Fabian Grünbichler (1):
>    webproxy: handle unflushed write buffer
> 
>   src/PVE/APIServer/AnyEvent.pm | 10 ++++++----
>   1 file changed, 6 insertions(+), 4 deletions(-)
> 
> qemu-server
> 
> Fabian Grünbichler (8):
>    refactor map_storage to map_id
>    schema: use pve-bridge-id
>    update_vm: allow simultaneous setting of boot-order and dev
>    nbd alloc helper: allow passing in explicit format
>    mtunnel: add API endpoints
>    migrate: refactor remote VM/tunnel start
>    migrate: add remote migration handling
>    api: add remote migrate endpoint
> 
>   PVE/API2/Qemu.pm   | 826 ++++++++++++++++++++++++++++++++++++++++++++-
>   PVE/QemuMigrate.pm | 813 ++++++++++++++++++++++++++++++++++++--------
>   PVE/QemuServer.pm  |  80 +++--
>   debian/control     |   2 +
>   4 files changed, 1539 insertions(+), 182 deletions(-)
> 





More information about the pve-devel mailing list