[pve-devel] RFC: vm migration+storage to external/remote proxmox cluster

Fri Mar 10 08:12:11 CET 2017

On Fri, Mar 10, 2017 at 06:35:23AM +0100, Alexandre DERUMIER wrote:
> Hi,
> 
> >>First, thanks for your work! 
> 
> And thanks for your review ! :)
> 
> 
> >>I did some shallow review of most patches, I hope most of it is somewhat 
> >>constructive (I had to include a few nit picks, sorry :)) 
> 
> I'll try to take time to reply for each patch comments.
> 
> 
> >>Some general methods like doing a lot manual with rm and echo > pipe looks 
> >>still hacky, but as its a RFC I'm thinking this wasn't your concern yet :) 
> >>As such methods can be dangerous, I would favor the ones where the target 
> >>cluster does this stuff, i.e. each cluster touches only his own stuff if 
> >>possible. 
> 
> yes, currently it's more a proof of concept without cleanup. (I have coded it in 1 night)
> I needed it fastly to move a customer on a new cluster/storage on remote datacenter.
> It's really hacky ;)  (but it's working, I have migrated around 50vms with it without problem)
> 
> >>
> >>A general flow I could imagine to work well with our stack and which 
> >>then would allow us to move 
> >>or implement this easier in a project like Proxmox Datacenter Manager 
> >>would be: 
> Can't wait for Proxmox Datacenter Manager :)  (I have now 3 cluster with 16 nodes, each around 700vms)
> 
> 1) external_migrate command gets executed 
> 
> >>2) Access to the other cluster gets made over API, this access can be 
> kept for the whole process 
> Great ! I was not sure about it, as in currently migration code we use only qm command through ssh tunnel.
> Doing it with api allow to to more thing.
> 
> For authentification, I don't known what's it better ?
> If we use the GUI, we could reuse client ticket.
> But for command line ?  Maybe generate the ticket through ssh tunnel, then use api ?
> 
> Do we have already a perl client implementation somewhere in the code ?

we have https://git.proxmox.com/?p=pve-apiclient.git ;)

there were/are plans of moving the cluster join operation to connect
over the API (and let the user verify the TLS certificate fingerprint to
create the initial trusted link) instead of the current SSH-based
implementation. once that is done, extending this base "connect to
external node/cluster, establish trust, login and call some API paths"
should be rather easy. that would offer the possibility of implementing
a lot of cool features (like this patch series, a cross-cluster manager
/ GUI in general, ..), without relying on ugly hacks.

IIRC there was even a patch series on pve-devel for this, of which only
parts were applied (and the rest probably needs some rebasing with all
the refactoring that happened since).

> >>3) do some error checks on the remote side, is the target storage 
> available, ... 
> Yes. Currently This is stopping when remote vm create the disk. It's works, but It's need to cleanup the disks.
> If we can do the check early it's better
> 
> >>4) get a VMID from the remote side, with /cluster/nextid (I plan to 
> >>revive my "reserve VMID" patch so that this can be done in a nice manner) 
> 
> I was not sure about the "reserve VMID". Does it work currently ? 
> (Is the nextid reserved for some seconds between 2 api calls)
> 
> >>5) Create the Target VM Skeleton from it's source config, via API call, 
> >>this should be probably done in phase1 
> >>6) Sync Disks now as we have a VMID which currently belongs still us, we 
> >>mustn't do anything before we have this VM on the remote cluster 
> >>7) Start the VM on target node, maybe add an "external_migration" 
> >>parameter so that we can let incoming but differ between external or not. 
> 
> Ok, got it. 
> 
> 8) do the migration as always 
> 
> >>9) cleanup locally (maybe with option to keep the VM on the old cluster 
> >>(as mentioned in reviews: can be potentially dangerous)) 
> 
> At least, it should be optionnal.  (we have also "disconnect" option on vm nics)

alternatively, we could mark/lock the source VM after a successful
migration, to prevent accidental starts or other operations? I am not
really sure about this...