[pve-devel] [PATCH qemu-server 4/4] implement PoC migration to remote cluster/node

Wed Mar 11 09:48:29 CET 2020

On March 11, 2020 8:55 am, Alexandre DERUMIER wrote:
> Hi,
> 
> Thinking about cross-cluster migration,
> 
> is there any plan to share storage across different cluster?
> It's not uncommon to  have multiple cluster as we are currently limited by corosync, and a shared storage with differents pools.

not yet. I guess you are thinking about Ceph? if you have a pool per 
cluster, you could simply configure them (with appropriate names to 
avoid mis-usage) on all clusters, and we could add a third level to our 
proposed targetstorage map. e.g., for a migration from cluster a to 
cluster b:

local_ceph:ceph_cluster_a:1

where the '1' indicates that these two storages are 'shared' in the PVE 
sense of having identical content on both clusters. in that case, 
instead of allocating a new volume and starting an NDB mirror, we'd just 
need to rewrite the volid and that drive would be done.

we could extend the storage config schema to add a 'guests' property 
similar to the 'nodes' one, to limit the view to certain vmids. a remote 
incoming migration could then add the vmid to that list on the target 
cluster, a remote outgoing migration could remove it on the source 
cluster. alternatively, we could have such a list/list of ranges on the 
datacenter level instead of the storage level (to signify: these VMs 
belong to this cluster), and simply ignore anything else. I think I need 
to think more about this ;)

> It could avoid data copy and allow simple live migration (through the new websocket)

yes, and to do a final 'cleanup' you could then do a move disk, or you 
could keep the disk where it is forever.

> The main problem is unique ids.  Guid could be a solution, but only implemented for disk id, 
> we couln't track orphans disk associations with vmid.

the problem is that we'd like to have the vmid in the volid to track 
ownership. replacing that with a random ID does not solve the problem, 
as we'd still need to track ownership somehow. as soon as we put that 
somewhere, we are back to square one.

> and using guid for vmid too, we can't use tap|veth current scheme, as they are limited 15 bytes. 
> (guid are 128bits, so 16 bytes  + an extra byte for nic number).

they are also way less readable / memorable. generating a short ID for 
the NICs is not the issue I think.

> Of course, this should be optionnal. but if an guid is detected, we could allow cross cluster live migration without disk migration.

as long as you somehow ensure that VMIDs don't conflict between 
clusters[1] a sort of shared storage migration would easily be possible. 
you could even not give any user allocatespace permission on the 
storages that are owned by other clusters. it kind of conflicts with 
another idea that I have been toying with though, which is 'targetvmid' 
to enable cross-cluster migration where the VMID does have a conflict ;)

1: e.g., by having a naming scheme where the VMID is prefixed by some 
sort of short cluster ID