[pve-devel] successfull migration but failed resume

Alexandre DERUMIER aderumier at odiso.com
Fri Aug 29 16:14:09 CEST 2014


>>I might be able to do some tests but I have to take this E5-2640 server out from this production cluster and create a new test cluster. It takes some days until I rearrange things. If that’s fine Im okay.
>>Does this mean I have to re-install proxmox 3.1 on both cluster nodes?

If you remove node from a cluster, yes, it's better to reinstall it before join a new cluster.

(BTW: It's proxmox 3.2 right ? not 3.1 ?)


could be great to test with current 3.10 kernel.



----- Mail original ----- 

De: "Christian Tari" <christian at zaark.com> 
À: "Alexandre DERUMIER" <aderumier at odiso.com> 
Envoyé: Vendredi 29 Août 2014 15:19:10 
Objet: Re: [pve-devel] successfull migration but failed resume 

Good. At least we are on track. 

I might be able to do some tests but I have to take this E5-2640 server out from this production cluster and create a new test cluster. It takes some days until I rearrange things. If that’s fine Im okay. 
Does this mean I have to re-install proxmox 3.1 on both cluster nodes? 

//Christian 


On 29 Aug 2014, at 15:08, Alexandre DERUMIER <aderumier at odiso.com> wrote: 

>>> Can it lead issues if we migrate between two different arch? BTW the prior is HP dL360G8 the latter is HP dl380G7. 
> 
> I have same bug with amd opteron 63XX -> 61XX, 
> 
> I think because of a bug of kvm, with the cpuflags :"xsave" existing on 63XX and not 61XX. 
> https://lkml.org/lkml/2014/2/22/58 
> 
> 
> It seem to be your case too, with 
> 
> E5-2640 0 @ 2.50GHz : xsave 
> CPU E5645 @ 2.40GHz : no xsave. 
> 
> 
> Does the migration in the reverse way is working ? 
> 
> 
> I have a kernel 3.10 patch for this xsave bug, but don't have tested it yet. 
> Don't known if you could test it ? 
> 
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Christian Tari" <christian at zaark.com> 
> À: "Alexandre DERUMIER" <aderumier at odiso.com> 
> Envoyé: Vendredi 29 Août 2014 14:16:59 
> Objet: Re: [pve-devel] successfull migration but failed resume 
> 
> Yes, the default, kvm64. 
> Can it lead issues if we migrate between two different arch? BTW the prior is HP dL360G8 the latter is HP dl380G7. 
> The strange thing is that it doesn’t happen every time. Especially after a failed migration the subsequent migrations always work. It happens often instances with relatively higher memory usage (6-18GB). Can it be some timeout while the content of the memory is being transferred? 
> Aug 29 11:37:42 ERROR: migration finished with problems (duration 00:04:23) 
> 
> 
> 
> 
> //Christian 
> 
> 
> 
> On 29 Aug 2014, at 14:08, Alexandre DERUMIER < aderumier at odiso.com > wrote: 
> 
> 
> and you guest cpu is kvm64? 
> 
> 
> ----- Mail original ----- 
> 
> De: "Christian Tari" < christian at zaark.com > 
> À: "Alexandre DERUMIER" < aderumier at odiso.com > 
> Envoyé: Vendredi 29 Août 2014 13:02:15 
> Objet: Re: [pve-devel] successfull migration but failed resume 
> 
> Source host: 
> processor : 11 
> vendor_id : GenuineIntel 
> cpu family : 6 
> model : 45 
> model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz 
> stepping : 7 
> cpu MHz : 2493.793 
> cache size : 15360 KB 
> physical id : 0 
> siblings : 12 
> core id : 5 
> cpu cores : 6 
> apicid : 11 
> initial apicid : 11 
> fpu : yes 
> fpu_exception : yes 
> cpuid level : 13 
> wp : yes 
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid 
> bogomips : 4987.58 
> clflush size : 64 
> cache_alignment : 64 
> address sizes : 46 bits physical, 48 bits virtual 
> power management: 
> 
> # pveversion 
> pve-manager/3.2-1/1933730b (running kernel: 2.6.32-27-pve) 
> 
> Target host: 
> processor : 11 
> vendor_id : GenuineIntel 
> cpu family : 6 
> model : 44 
> model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz 
> stepping : 2 
> cpu MHz : 2399.404 
> cache size : 12288 KB 
> physical id : 1 
> siblings : 12 
> core id : 9 
> cpu cores : 6 
> apicid : 50 
> initial apicid : 50 
> fpu : yes 
> fpu_exception : yes 
> cpuid level : 11 
> wp : yes 
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid 
> bogomips : 4798.17 
> clflush size : 64 
> cache_alignment : 64 
> address sizes : 40 bits physical, 48 bits virtual 
> power management: 
> 
> # pveversion 
> pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve) 
> 
> //Christian 
> 
> 
> On 29 Aug 2014, at 12:56, Alexandre DERUMIER < aderumier at odiso.com > wrote: 
> 
> 
>  
> 
> <blockquote> 
> 
> <blockquote> 
> Aug 29 11:37:39 ERROR: VM 711 not running 
> 
> 
> 
>  
> 
> It's seem that the kvm process has crashed just after the migration, that's why resume failed. 
> 
> what is the host source and target processors ? 
> 
> 
> 
> ----- Mail original ----- 
> 
> De: "Christian Tari" < christian at zaark.com > 
> À: aderumier at odiso.com 
> Envoyé: Vendredi 29 Août 2014 12:31:10 
> Objet: [pve-devel] successfull migration but failed resume 
> 
> Hi, 
> 
> I know its isn’t the proper way to get support, but we are having exactly the same issue as described in the mail thread. 
> I’ve been browsing the forum for a while now but can’t find any similar case. 
> 
> Aug 29 11:37:39 migration speed: 31.50 MB/s - downtime 150 ms 
> Aug 29 11:37:39 migration status: completed 
> Aug 29 11:37:39 ERROR: VM 711 not running 
> Aug 29 11:37:39 ERROR: command '/usr/bin/ssh -o 'BatchMode=yes' root at 1.1.1.1 qm resume 711 --skiplock' failed: exit code 2 
> 
> pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve) 
> 
> Is there anything we can do, any hints you can give? 
> 
> Thanks, 
> Christian Tari 
> 
> </blockquote> 
> 
> </blockquote> 



More information about the pve-devel mailing list