[PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system

Chris Sutcliff chris at itg.uy
Mon Sep 7 13:29:19 CEST 2020


Hi,

I'm using the 10G Lan variant of this board with a 3700x and haven't had any issues.

There is a "beta" bios version available from ASRock which updates the AGESA version to 1.0.0.6 (https://download.asrock.com/BIOS/Server/X470D4U(L3.37)ROM.zip) which might be worth trying? I'm using the equivalent version on my board.


Kind Regards

Chris Sutcliff
Sutcliff Limited

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, September 4, 2020 3:45 PM, Hermann Himmelbauer <hermann at qwer.tk> wrote:

> Dear Proxmox users,
>
> I'm trying to install a 3-node cluster (latest proxmox/ceph) and
> experience random freezes. The node can either be completely frozen (no
> blinking cursor on console, no ping) or can get somewhat blocked / slow etc.
>
> This happens most often on node 2 (approx. 3-4 times / day), node 3
> never got stuck within 14 days runtime, node 1 once.
>
> Unfortunately I did not find any way to trigger this behaviour, however,
> I think that this happens most often if I stress the machine in some
> way (performance test within a virtual machine) and then idling the machine.
>
> When the machine freezes completely, there is no logfile. However, if it
> is partially frozen, some info can be aquired via dmesg. (See attached
> file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
> perhaps there is some driver issue regarding this ethernet adapter?)
>
> The system consists of the following components:
>
> -   AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
>
> -   ASRock Rack X470D4U2-2T (Mainboard)
>
> -   Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
>
> -   2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
>     Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
>
> -   be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
>
> -   2 * Micron 5300 PRO - Read Intensive 960GB, SATA
>     (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
>
> -   LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
>
>     The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
>     #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
>
>     What I did so far (without success):
>
> -   Disabled C6 as I read that this CPU-state can lead to unstable systems
>     (via "python zenstates.py --c6-disable" -> still errors).
>
> -   Updated my Bios to the latest version (3.30)
>
> -   Checked that the CPU + RAM are compatible to the mainboard (they are
>     listed as compatible on the ASRock website)
>
> -   Checked logs in IPMI (undervoltage, temperature etc., nothing is logged)
>
> -   Memory test (memtest86, no errors)
>
>     Do you have any clue what could be the reason for these freezes? Should
>     I think of some hardware error? Or is this some known Linux bug that can
>     be fixed?
>
>     Best Regards,
>     Hermann
>
>     --
>     hermann at qwer.tk
>     PGP/GPG: 299893C7 (on keyservers)
>
>
> pve-user mailing list
> pve-user at lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user






More information about the pve-user mailing list