[PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system

Eneko Lacunza elacunza at binovo.es
Tue Nov 17 09:33:37 CET 2020


Hi Hermann,

Glad to read this. My Ryzen 2200G home desktop hangs from time-to-time 
with desktop (non-gaming) use, so it's quite clear there's some problem 
with the integrated graphics.

It's nice to know the main board is stable, though :)

Cheers

El 16/11/20 a las 18:21, Hermann Himmelbauer escribió:
> Hi,
> In case someone is interested, the problem is now solved, the system 
> seems to be rock solid after ~ 2 month testing:
>
> I changed the AMD Ryzen 3 3200G to a AMD Ryzen 5 3600 on one node and 
> to a AMD Ryzen 3 3100 on the two other nodes, now the problem is gone.
>
> I don't really know why, I can think of two reasons:
>
> 1) The 3200G did not support ECC but I use ECC RAM. Maybe this leads 
> to errors (although intensive memory testing with memtest86 did not 
> report anything).
> 2) The new CPUs do not have integrated graphic capabilities. I noticed 
> that the two onboard 10GBit-Ethernet adapters now have other PCI 
> addresses with the new CPU. And with the old CPUs there were problem 
> with malfunctioning of these 10G adapters.
>
> Many thanks for input + your help.
>
> The ASRock Rack X470D4U2-2T is definitly stable now.
>
> Best Regards,
> Hermann
>
> Am 04.09.20 um 16:45 schrieb Hermann Himmelbauer:
>> Dear Proxmox users,
>>
>> I'm trying to install a 3-node cluster (latest proxmox/ceph) and
>> experience random freezes. The node can either be completely frozen (no
>> blinking cursor on console, no ping) or can get somewhat blocked / 
>> slow etc.
>>
>> This happens most often on node 2 (approx. 3-4 times / day), node 3
>> never got stuck within 14 days runtime, node 1 once.
>>
>> Unfortunately I did not find any way to trigger this behaviour, however,
>> I *think* that this happens most often if I stress the machine in some
>> way (performance test within a virtual machine) and then idling the 
>> machine.
>>
>> When the machine freezes completely, there is no logfile. However, if it
>> is partially frozen, some info can be aquired via dmesg. (See attached
>> file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
>> perhaps there is some driver issue regarding this ethernet adapter?)
>>
>> The system consists of the following components:
>>
>> - AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
>> - ASRock Rack X470D4U2-2T (Mainboard)
>> - Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
>> - 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
>> Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
>> - be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
>> - 2 * Micron 5300 PRO - Read Intensive 960GB, SATA
>> (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
>> - LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
>>
>> The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
>> #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
>>
>> What I did so far (without success):
>>
>> - Disabled C6 as I read that this CPU-state can lead to unstable systems
>> (via "python zenstates.py --c6-disable" -> still errors).
>> - Updated my Bios to the latest version (3.30)
>> - Checked that the CPU + RAM are compatible to the mainboard (they are
>> listed as compatible on the ASRock website)
>> - Checked logs in IPMI (undervoltage, temperature etc., nothing is 
>> logged)
>> - Memory test (memtest86, no errors)
>>
>> Do you have any clue what could be the reason for these freezes? Should
>> I think of some hardware error? Or is this some known Linux bug that can
>> be fixed?
>>
>> Best Regards,
>> Hermann
>>
>>
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>>
>
> _______________________________________________
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>


-- 
Eneko Lacunza                | +34 943 569 206
                              | elacunza at binovo.es
Zuzendari teknikoa           | https://www.binovo.es
Director técnico             | Astigarragako Bidea, 2 - 2º izda.
BINOVO IT HUMAN PROJECT S.L  | oficina 10-11, 20180 Oiartzun





More information about the pve-user mailing list