[pve-devel] qemu ha migration : race between move file and resume vm

Alexandre DERUMIER aderumier at odiso.com
Wed Oct 14 10:20:35 CEST 2015


>>Restart the pve-ha-lrm service, this should use the new code as it 
>>executes the resource managers, i.e. makes the API call. 
>>
>>> systemctl restart pve-ha-lrm.service 

Perfect ! Thanks !


Offtopic :

About systemd, do we still need /etc/init.d/pve-*  init scripts ?
I don't have installed a fresh proxmox4 yet, only upgrades from proxmox3,
and old init script are still here.




----- Mail original -----
De: "Thomas Lamprecht" <t.lamprecht at proxmox.com>
À: "pve-devel" <pve-devel at pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 08:53:48
Objet: Re: [pve-devel] qemu ha migration : race between move file and resume vm

Restart the pve-ha-lrm service, this should use the new code as it 
executes the resource managers, i.e. makes the API call. 

> systemctl restart pve-ha-lrm.service 


On 10/14/2015 08:32 AM, Alexandre DERUMIER wrote: 
> also, 
> 
> I need to debug that fast, but I don't known how reload proxmox code in ha 
> when I'm doing change in perl code. 
> 
> (for example, if I add some logs in qemumigrate.pm, I don't see it in migrate task log, 
> when it's launch through HA. I need to reboot the server to get the new code working. 
> I have tried to restart pvedaemon and pve-cluster, but it's not working with ha) 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderumier at odiso.com> 
> À: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Mercredi 14 Octobre 2015 08:16:16 
> Objet: Re: [pve-devel] qemu ha migration : race between move file and resume vm 
> 
>>> The HA manager moves the config only when the VM is offline and gets an 
>>> migrate command, which shouldn't be the case here :) 
> Ok, thanks. 
> I'll do more tests without HA to be sure. 
> 
> 
> ----- Mail original ----- 
> De: "Thomas Lamprecht" <t.lamprecht at proxmox.com> 
> À: "pve-devel" <pve-devel at pve.proxmox.com> 
> Envoyé: Mercredi 14 Octobre 2015 08:03:02 
> Objet: Re: [pve-devel] qemu ha migration : race between move file and resume vm 
> 
> On 10/14/2015 07:40 AM, Alexandre DERUMIER wrote: 
>> Hi, 
>> 2 users have reported a migration problem when ha is enabled 
>> http://forum.proxmox.com/threads/23848-PVE-4-KVM-live-migration-problem 
>> 
>> I'm also enable to reproduce it 
>> 
>> task log 
>> --------- 
>> task started by HA resource agent 
>> Oct 14 07:27:48 starting migration of VM 125 to node 'kvmtest2' (10.3.94.47) 
>> Oct 14 07:27:48 copying disk images 
>> Oct 14 07:27:48 starting VM 125 on remote node 'kvmtest2' 
>> Oct 14 07:27:49 starting ssh migration tunnel 
>> Oct 14 07:27:51 starting online/live migration on 10.3.94.47:60000 
>> Oct 14 07:27:51 migrate_set_speed: 8589934592 
>> Oct 14 07:27:51 migrate_set_downtime: 0.1 
>> Oct 14 07:27:53 migration speed: 64.00 MB/s - downtime 7 ms 
>> Oct 14 07:27:53 migration status: completed 
>> Oct 14 07:27:54 ERROR: unable to find configuration file for VM 125 - no such machine 
>> Oct 14 07:27:54 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 10.3.94.47 qm resume 125 --skiplock' failed: exit code 2 
>> Oct 14 07:27:57 ERROR: migration finished with problems (duration 00:00:09) 
>> TASK ERROR: migration problems 
>> 
>> 
>> 
>> The problem is in QemuMigrate.pm, 
>> in phase3 cleanup 
>> 
>> 
>> die "Failed to move config to node '$self->{node}' - rename failed: $!\n" 
>> if !rename($conffile, $newconffile); 
>> 
>> if ($self->{livemigration}) { 
>> # now that config file is move, we can resume vm on target if livemigrate 
>> my $cmd = [@{$self->{rem_ssh}}, 'qm', 'resume', $vmid, '--skiplock']; 
>> eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, 
>> errfunc => sub { 
>> my $line = shift; 
>> $self->log('err', $line); 
>> }); 
>> }; 
>> if (my $err = $@) { 
>> $self->log('err', $err); 
>> $self->{errors} = 1; 
>> } 
>> } 
>> 
>> 
>> 
>> The move file is done on source node, 
>> but the target node don't see the moved file until around 3s, so the resume is dying. 
>> 
>> 
>> I don't known how HA is related here ? maybe some kind of file lock ? 
> No, HA does not lock the config file, it more or less makes an API call 
> to Qemu->migrate, like: 
> 
>> my $upid = PVE::API2::Qemu->migrate_vm($params); 
>> $haenv->upid_wait($upid); 
> with the params: 
>> my $params = { 
>> node => $nodename, 
>> vmid => $vmid, 
>> target => $target, 
>> online => 1, 
>> }; 
> This happens in an forked process which then waits until completion of 
> the task. 
> 
> The HA manager moves the config only when the VM is offline and gets an 
> migrate command, which shouldn't be the case here :) 
> 
>> _______________________________________________ 
>> pve-devel mailing list 
>> pve-devel at pve.proxmox.com 
>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
>> 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
> 
> _______________________________________________ 
> pve-devel mailing list 
> pve-devel at pve.proxmox.com 
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 


_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 




More information about the pve-devel mailing list