[PVE-User] VMs hung after live migration - Intel CPU

Eneko Lacunza elacunza at binovo.es
Tue Nov 8 18:18:44 CET 2022


Hi Jan,

I had some time to re-test this.

I tried live migration with KVM64 CPU between 2 nodes:

node-ryzen1700 - kernel 5.19.7-1-pve
node-ryzen5900x - kernel 5.19.7-1-pve

I bulk-migrated 9 VMs (8 Debian 9/10/11 and 1 Windows 2008r2).
This works OK in both directions.

Then I downgraded a node to 5.13:
node-ryzen1700 - kernel 5.19.7-1-pve
node-ryzen5900x - kernel 5.13.19-6-pve

Migration of those 9 VMs worked well from node-ryzen1700 -> node->ryzen5900x

But migration of those 9 VMs back node->ryzen5900x -> node-ryzen1700 was 
a disaster: all 8 debian VMs hung with 50/100% CPU use. Window 2008r2 
seems not affected by the issue at all.

3 other Debian/Windows VMs on node-ryzen1700 were not affected.

After migrating both nodes to kernel 5.13:

node-ryzen1700 - kernel 5.13.19-6-pve
node-ryzen5900x - kernel 5.13.19-6-pve

Migration of those 9 VMs node->ryzen5900x -> node-ryzen1700 works as 
intended :)

Cheers



El 8/11/22 a las 9:40, Eneko Lacunza via pve-user escribió:
> Hi Jan,
>
> Yes, there's no issue if CPUs are the same.
>
> VMs hang when CPUs are of different enough generation, even being of 
> the same brand and using KVM64 vCPU.
>
> El 7/11/22 a las 22:59, Jan Vlach escribió:
>> Hi,
>>
>> For what’s it worth, live VM migration with Linux VMs with various 
>> debian versions work here just fine. I’m using virtio for networking 
>> and virtio scsi for disks. (The only version where I had problems was 
>> debian6 where the kernel does not support virtio scsi and megaraid 
>> sas 8708EM2 needs to be used. I get kernel panic in mpt_sas on thaw 
>> after migration.)
>>
>> We're running 5.15.60-1-pve on three node cluster with AMD EPYC 7551P 
>> 32-Core Processor. These are supermicros with latest bios (latest 
>> microcode?) and BMC
>>
>> Storage is local ZFS pool, backed by SSDS in striped mirrors (4 
>> devices on each node). Migration has dedicated 2x 10GigE LACP and 
>> dedicated VLAN on switch stack.
>>
>> I have more nodes with EPYC3/Milan on the way, so I’ll test those 
>> later as well.
>>
>> What does your cluster look hardware-wise? What are the problems you 
>> experienced with VM migratio on 5.13->5.19?
>>
>> Thanks,
>> JV 

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/


More information about the pve-user mailing list