[pve-devel] qemu ha migration : race between move file and resume vm
Alexandre DERUMIER
aderumier at odiso.com
Wed Oct 14 10:20:35 CEST 2015
>>Restart the pve-ha-lrm service, this should use the new code as it
>>executes the resource managers, i.e. makes the API call.
>>
>>> systemctl restart pve-ha-lrm.service
Perfect ! Thanks !
Offtopic :
About systemd, do we still need /etc/init.d/pve-* init scripts ?
I don't have installed a fresh proxmox4 yet, only upgrades from proxmox3,
and old init script are still here.
----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 08:53:48
Objet: Re: [pve-devel] qemu ha migration : race between move file and resume vm
Restart the pve-ha-lrm service, this should use the new code as it
executes the resource managers, i.e. makes the API call.
> systemctl restart pve-ha-lrm.service
On 10/14/2015 08:32 AM, Alexandre DERUMIER wrote:
> also,
>
> I need to debug that fast, but I don't known how reload proxmox code in ha
> when I'm doing change in perl code.
>
> (for example, if I add some logs in qemumigrate.pm, I don't see it in migrate task log,
> when it's launch through HA. I need to reboot the server to get the new code working.
> I have tried to restart pvedaemon and pve-cluster, but it's not working with ha)
>
>
> ----- Mail original -----
> De: "aderumier" <aderumier at odiso.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mercredi 14 Octobre 2015 08:16:16
> Objet: Re: [pve-devel] qemu ha migration : race between move file and resume vm
>
>>> The HA manager moves the config only when the VM is offline and gets an
>>> migrate command, which shouldn't be the case here :)
> Ok, thanks.
> I'll do more tests without HA to be sure.
>
>
> ----- Mail original -----
> De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
> À: "pve-devel" <pve-devel at pve.proxmox.com>
> Envoyé: Mercredi 14 Octobre 2015 08:03:02
> Objet: Re: [pve-devel] qemu ha migration : race between move file and resume vm
>
> On 10/14/2015 07:40 AM, Alexandre DERUMIER wrote:
>> Hi,
>> 2 users have reported a migration problem when ha is enabled
>> http://forum.proxmox.com/threads/23848-PVE-4-KVM-live-migration-problem
>>
>> I'm also enable to reproduce it
>>
>> task log
>> ---------
>> task started by HA resource agent
>> Oct 14 07:27:48 starting migration of VM 125 to node 'kvmtest2' (10.3.94.47)
>> Oct 14 07:27:48 copying disk images
>> Oct 14 07:27:48 starting VM 125 on remote node 'kvmtest2'
>> Oct 14 07:27:49 starting ssh migration tunnel
>> Oct 14 07:27:51 starting online/live migration on 10.3.94.47:60000
>> Oct 14 07:27:51 migrate_set_speed: 8589934592
>> Oct 14 07:27:51 migrate_set_downtime: 0.1
>> Oct 14 07:27:53 migration speed: 64.00 MB/s - downtime 7 ms
>> Oct 14 07:27:53 migration status: completed
>> Oct 14 07:27:54 ERROR: unable to find configuration file for VM 125 - no such machine
>> Oct 14 07:27:54 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 10.3.94.47 qm resume 125 --skiplock' failed: exit code 2
>> Oct 14 07:27:57 ERROR: migration finished with problems (duration 00:00:09)
>> TASK ERROR: migration problems
>>
>>
>>
>> The problem is in QemuMigrate.pm,
>> in phase3 cleanup
>>
>>
>> die "Failed to move config to node '$self->{node}' - rename failed: $!\n"
>> if !rename($conffile, $newconffile);
>>
>> if ($self->{livemigration}) {
>> # now that config file is move, we can resume vm on target if livemigrate
>> my $cmd = [@{$self->{rem_ssh}}, 'qm', 'resume', $vmid, '--skiplock'];
>> eval{ PVE::Tools::run_command($cmd, outfunc => sub {},
>> errfunc => sub {
>> my $line = shift;
>> $self->log('err', $line);
>> });
>> };
>> if (my $err = $@) {
>> $self->log('err', $err);
>> $self->{errors} = 1;
>> }
>> }
>>
>>
>>
>> The move file is done on source node,
>> but the target node don't see the moved file until around 3s, so the resume is dying.
>>
>>
>> I don't known how HA is related here ? maybe some kind of file lock ?
> No, HA does not lock the config file, it more or less makes an API call
> to Qemu->migrate, like:
>
>> my $upid = PVE::API2::Qemu->migrate_vm($params);
>> $haenv->upid_wait($upid);
> with the params:
>> my $params = {
>> node => $nodename,
>> vmid => $vmid,
>> target => $target,
>> online => 1,
>> };
> This happens in an forked process which then waits until completion of
> the task.
>
> The HA manager moves the config only when the VM is offline and gets an
> migrate command, which shouldn't be the case here :)
>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel at pve.proxmox.com
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>
> _______________________________________________
> pve-devel mailing list
> pve-devel at pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
_______________________________________________
pve-devel mailing list
pve-devel at pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
More information about the pve-devel
mailing list