[PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system

Hermann Himmelbauer hermann at qwer.tk
Mon Sep 7 20:44:43 CEST 2020


Dear Wolfgang,
Thank you for your reply. Glad to hear that the board is stable for you.

My BIOS has the default values, so no overclocking or the like. Did you
do any alterations? Did you in some way disable C6?

Maybe this is really some defect (mainboard, RAM, cpu, power supply...)
- since my posting I managed to crash node 2, however, node 1 + node 3
are stable.

BTW - did you manage to get ECC running? I do have ECC memory but it
does not seem to be detected. Maybe this is due to the AMD Ryzen 3 3200G
- I read somewhere that the CPUs with integrated graphic do not report ECC?

Can you perhaps send me the other components of your system?

The board itself + the AMD CPUs are a very price-efficient combination.
The onboard 10GBit ethernet is great for ceph, I get quite good I/O
speeds. If things get stable, it's a perfect combination for a cost
efficient HA cluster, I think.

Best Regards,
Hermann

Am 07.09.20 um 13:21 schrieb Wolfgang Link:
> Hi Hermann,
> 
> this board with this Bios version and an Ryzen 9 3900X is running perfectly over 4 month, also with very high load in the VM.
> 
> What have you set at BIOS?
> 
> Regards
> 
> Wolfgang
>> On 09/04/2020 4:45 PM Hermann Himmelbauer <hermann at qwer.tk> wrote:
>>
>>  
>> Dear Proxmox users,
>>
>> I'm trying to install a 3-node cluster (latest proxmox/ceph) and
>> experience random freezes. The node can either be completely frozen (no
>> blinking cursor on console, no ping) or can get somewhat blocked / slow etc.
>>
>> This happens most often on node 2 (approx. 3-4 times / day), node 3
>> never got stuck within 14 days runtime, node 1 once.
>>
>> Unfortunately I did not find any way to trigger this behaviour, however,
>> I *think* that this happens most often if I stress the machine in some
>> way (performance test within a virtual machine) and then idling the machine.
>>
>> When the machine freezes completely, there is no logfile. However, if it
>> is partially frozen, some info can be aquired via dmesg. (See attached
>> file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
>> perhaps there is some driver issue regarding this ethernet adapter?)
>>
>> The system consists of the following components:
>>
>> - AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
>> - ASRock Rack X470D4U2-2T (Mainboard)
>> - Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
>> - 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
>> Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
>> - be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
>> - 2 * Micron 5300 PRO - Read Intensive 960GB, SATA
>> (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
>> - LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
>>
>> The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
>> #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
>>
>> What I did so far (without success):
>>
>> - Disabled C6 as I read that this CPU-state can lead to unstable systems
>> (via "python zenstates.py --c6-disable" -> still errors).
>> - Updated my Bios to the latest version (3.30)
>> - Checked that the CPU + RAM are compatible to the mainboard (they are
>> listed as compatible on the ASRock website)
>> - Checked logs in IPMI (undervoltage, temperature etc., nothing is logged)
>> - Memory test (memtest86, no errors)
>>
>> Do you have any clue what could be the reason for these freezes? Should
>> I think of some hardware error? Or is this some known Linux bug that can
>> be fixed?
>>
>> Best Regards,
>> Hermann
>>
>> -- 
>> hermann at qwer.tk
>> PGP/GPG: 299893C7 (on keyservers)
>> _______________________________________________
>> pve-user mailing list
>> pve-user at lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 

-- 
hermann at qwer.tk
PGP/GPG: 299893C7 (on keyservers)




More information about the pve-user mailing list