[PVE-User] HA migration behaviour vs. failures
Dhaussy Alexandre
ADhaussy at voyages-sncf.com
Thu Jul 24 11:16:45 CEST 2014
@Proxmox devs :
Sorry for being a bit cheeky here, but why did you change the way RedHat
cluster behaves with online migration ? (migration failure is a
non-critical error)
I guess there must be a good reason, but still i'm interested to know
why. :$
Regards,
--
Alexandre DHAUSSY
Le 22/07/2014 18:30, Dhaussy Alexandre a écrit :
> Greetings,
>
> I've been "playing" with the last version of proxmox (3 nodes cluster + glusterfs) for a couple of month.
> My goal is to replace 3 RedHat 5 KVM servers (no HA) hosting ~100 VMs on NAS storage.
>
> But i have some annoying issues with live migrations..
> Sometimes it will work, but sometimes (with no reason) it won't.
> When it fails (migration aborted), i try again then it works ! :(
>
> Jul 11 14:48:49 starting ssh migration tunnel
> Jul 11 14:48:50 starting online/live migration on localhost:60000
> Jul 11 14:48:50 migrate_set_speed: 8589934592
> Jul 11 14:48:50 migrate_set_downtime: 0.1
> Jul 11 14:48:52 ERROR: online migrate failure - aborting
> Jul 11 14:48:52 aborting phase 2 - cleanup resources
> Jul 11 14:48:52 migrate_cancel
> Jul 11 14:58:52 ERROR: migration finished with problems (duration 00:10:05)
> TASK ERROR: migration problems
>
> I tried to :
> - disable spice.
> - set cpu to 'default' (kvm64) instead of 'host'.
> - shared storage 'Directory' (fuse mount.) instead of 'GlusterFS'.
> But no luck, still random failures.
>
> My problem with that is when the VMs will be added to HA cluster...because proxmox seems to stop the service when live migration fails.
> I can't see why someone would wan't to stop a HA VM, because live migration fails but the VM is still running ?
>
> I remember i have another cluster here (2 nodes RedHat 6 KVM cluster, VMs with HA) and when ha migration fails VMs stay started on the original node.
> I thought it would be then possible to achieve the same behaviour with Proxmox ?
>
> Having the VMs stopped in a HA cluster is a no go, so i ended doing some nasty changes in the code.
> I'm still interested in a better solution, so far it seems to do what i need..
>
> +++ /usr/share/cluster/pvevm 2014-07-22 15:22:29.703424516 +0200
> @@ -28,6 +28,7 @@
> use constant OCF_NOT_RUNNING => 7;
> use constant OCF_RUNNING_MASTER => 8;
> use constant OCF_FAILED_MASTER => 9;
> +use constant OCF_ERR_MIGRATE => 150;
>
> $ENV{'PATH'} = '/sbin:/bin:/usr/sbin:/usr/bin';
>
> @@ -358,6 +359,9 @@
>
> upid_wait($upid);
>
> + check_running($status);
> + exit(OCF_ERR_MIGRATE) if $status->{running};
> +
> # something went wrong if old config file is still there
> exit((-f $oldconfig) ? OCF_ERR_GENERIC : OCF_SUCCESS);
>
> +++ /usr/share/perl5/PVE/API2/Qemu.pm 2014-07-22 15:51:31.909558803 +0200
> @@ -1634,7 +1634,7 @@
>
> my $storecfg = PVE::Storage::config();
>
> - if (&$vm_is_ha_managed($vmid) && $rpcenv->{type} ne 'ha') {
> + if (&$vm_is_ha_managed($vmid) && $rpcenv->{type} ne 'ha' && !defined($migratedfrom)) {
>
> my $hacmd = sub {
> my $upid = shift;
>
>
> Regards,
>
More information about the pve-user
mailing list