[pve-devel] qemu ha migration : race between move file and resume vm

Alexandre DERUMIER aderumier at odiso.com
Wed Oct 14 07:40:48 CEST 2015


Hi,
2 users have reported a migration problem when ha is enabled
http://forum.proxmox.com/threads/23848-PVE-4-KVM-live-migration-problem

I'm also enable to reproduce it

task log
---------
task started by HA resource agent
Oct 14 07:27:48 starting migration of VM 125 to node 'kvmtest2' (10.3.94.47)
Oct 14 07:27:48 copying disk images
Oct 14 07:27:48 starting VM 125 on remote node 'kvmtest2'
Oct 14 07:27:49 starting ssh migration tunnel
Oct 14 07:27:51 starting online/live migration on 10.3.94.47:60000
Oct 14 07:27:51 migrate_set_speed: 8589934592
Oct 14 07:27:51 migrate_set_downtime: 0.1
Oct 14 07:27:53 migration speed: 64.00 MB/s - downtime 7 ms
Oct 14 07:27:53 migration status: completed
Oct 14 07:27:54 ERROR: unable to find configuration file for VM 125 - no such machine
Oct 14 07:27:54 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 10.3.94.47 qm resume 125 --skiplock' failed: exit code 2
Oct 14 07:27:57 ERROR: migration finished with problems (duration 00:00:09)
TASK ERROR: migration problems



The problem is in QemuMigrate.pm,
in phase3 cleanup


    die "Failed to move config to node '$self->{node}' - rename failed: $!\n"
        if !rename($conffile, $newconffile);

    if ($self->{livemigration}) {
        # now that config file is move, we can resume vm on target if livemigrate
        my $cmd = [@{$self->{rem_ssh}}, 'qm', 'resume', $vmid, '--skiplock'];
        eval{ PVE::Tools::run_command($cmd, outfunc => sub {},
                errfunc => sub {
                    my $line = shift;
                    $self->log('err', $line);
                });
        };
        if (my $err = $@) {
            $self->log('err', $err);
            $self->{errors} = 1;
        }
    }



The move file is done on source node, 
but the target node don't see the moved file until around 3s, so the resume is dying.


I don't known how HA is related here ? maybe some kind of file lock ?



More information about the pve-devel mailing list