[pve-devel] [RFC kernel] revert problematic TSC multiplier commit

Eneko Lacunza elacunza at binovo.es
Fri Sep 2 09:59:34 CEST 2022


Hi,

El 2/9/22 a las 9:47, Fiona Ebner escribió:
> Am 02.09.22 um 09:22 schrieb Eneko Lacunza:
>> Hi Fiona,
>>
>> Does this patch correspond to kernels linked in this forum thread?
>>
>> https://forum.proxmox.com/threads/proxmox-7-2-3-ceph-16-2-7-migrating-vms-hangs-them-kernel-panic-on-linux-freeze-on-windows.109645/page-2#post-488479
>>
> No, there is no public build with the below patch yet.
Ok, thanks for the clarification.

>
> Did you already test the kernel with the fpu patches that's mentioned in
> that forum post?

No, I was waiting for a good time-window in our prod cluster to test it 
:) Seems it will be today.

>
>> If so I can test them and see if that helps with bugzilla entry #4073:
>> https://bugzilla.proxmox.com/show_bug.cgi?id=4073
>>
> I don't think theses issues are related, as there, the VM that's been
> migrated hangs, and here other VMs on the node were affected.

Yes, that's true, but I have seen other VMs on the nodes to be affected 
too (but less frequently). Maybe we are impacted by the two issues :)

>
>>>> which might be responsible for several issues reported in the
>>>> community forum[0][1].
>>>>
>>>> In my case, loading a VM snapshot that originally was taken on
>>>> a CPU from a different vendor often caused problems in other VMs(!).
>>>> In particular, it often led to RCU stalls (with similar messages as in
>>>> [1]) or slowdowns, and sometimes clock jumps far into the future (like
>>>> in [0]). With this revert applied, everything seems to run smoothly
>>>> even after loading the "bad" snapshot 10 times.
>>>>
>>>> [0]https://forum.proxmox.com/threads/112756/
>>>> [1]https://forum.proxmox.com/threads/111494/
> The fix 11d39e8cc43e1c6737af19ca9372e590061b5ad2 is only for AMD/SVM, so
> most likely [1], where people with Intel N5105 are affected, is not
> related either. RCU stall messages can happen for different reasons of
> course ;)
>

Our cluster has AMD CPUs.

I'll report back the results of our tests if I can finally try the test 
kernel today.

Thanks

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/



More information about the pve-devel mailing list