[PVE-User] Server freezing randomly with Proxmox 6.2-4 on AMD Ryzen system

Wolfgang Link w.link at proxmox.com
Tue Sep 8 06:25:42 CEST 2020


> On 09/07/2020 8:44 PM Hermann Himmelbauer <hermann at qwer.tk> wrote:
> 
>  
> Dear Wolfgang,
> Thank you for your reply. Glad to hear that the board is stable for you.
> 
> My BIOS has the default values, so no overclocking or the like. Did you
> do any alterations? Did you in some way disable C6?

As you noticed, you should turn off all energy savings. This is somewhat hidden in AMD Bios.
You can find them in the expanded section "AMS CBS" and are not afraid of waring in some of the submenus. You can ignore them and don't make sense because options like ECO mode are hidden there.
I also disable boost to reduce the time drift in the KVM.
> 
> Maybe this is really some defect (mainboard, RAM, cpu, power supply...)
> - since my posting I managed to crash node 2, however, node 1 + node 3
> are stable.
> 
> BTW - did you manage to get ECC running? I do have ECC memory but it
> does not seem to be detected. Maybe this is due to the AMD Ryzen 3 3200G
> - I read somewhere that the CPUs with integrated graphic do not report ECC?

The Ryzen 3 3200G does not support ECC as WIKIChips entry say.
https://en.wikichip.org/wiki/amd/ryzen_3/3200g

Why do you have not buy an Ryzen 3 3100 it cost nearly the same and is 30% faster and support SMT and ECC?

> 
> Can you perhaps send me the other components of your system?

4 X Samsung M391A4G43MB1-CTDQ 32GB Dimm at 2666
Lenovo 430-8i HBA

> 
> The board itself + the AMD CPUs are a very price-efficient combination.
> The onboard 10GBit ethernet is great for ceph, I get quite good I/O
> speeds. If things get stable, it's a perfect combination for a cost
> efficient HA cluster, I think.
> 
> Best Regards,
> Hermann
> 
> Am 07.09.20 um 13:21 schrieb Wolfgang Link:
> > Hi Hermann,
> > 
> > this board with this Bios version and an Ryzen 9 3900X is running perfectly over 4 month, also with very high load in the VM.
> > 
> > What have you set at BIOS?
> > 
> > Regards
> > 
> > Wolfgang
> >> On 09/04/2020 4:45 PM Hermann Himmelbauer <hermann at qwer.tk> wrote:
> >>
> >>  
> >> Dear Proxmox users,
> >>
> >> I'm trying to install a 3-node cluster (latest proxmox/ceph) and
> >> experience random freezes. The node can either be completely frozen (no
> >> blinking cursor on console, no ping) or can get somewhat blocked / slow etc.
> >>
> >> This happens most often on node 2 (approx. 3-4 times / day), node 3
> >> never got stuck within 14 days runtime, node 1 once.
> >>
> >> Unfortunately I did not find any way to trigger this behaviour, however,
> >> I *think* that this happens most often if I stress the machine in some
> >> way (performance test within a virtual machine) and then idling the machine.
> >>
> >> When the machine freezes completely, there is no logfile. However, if it
> >> is partially frozen, some info can be aquired via dmesg. (See attached
> >> file). ("device=2b:00.0" is an intel 10GBit ethernet adapter (X550T). So
> >> perhaps there is some driver issue regarding this ethernet adapter?)
> >>
> >> The system consists of the following components:
> >>
> >> - AMD Ryzen 3 3200G, 4x 3.60GHz, boxed (YD3200C5FHBOX)
> >> - ASRock Rack X470D4U2-2T (Mainboard)
> >> - Samsung SSD 970 EVO Plus 250GB, M.2 (MZ-V7S250BW) (builtin SSD for OS)
> >> - 2 * Kingston Server Premier DIMM 16GB, DDR4-2666, CL19-19-19, ECC (BOM
> >> Number: 9965745-002.A00G, Part Number: KSM26ED8/16ME)
> >> - be quiet! Pure Power 11 CM 400W ATX 2.4 (BN296) (Power supply)
> >> - 2 * Micron 5300 PRO - Read Intensive 960GB, SATA
> >> (MTFDDAK960TDS-1AW1Z6) (SSD for Ceph)
> >> - LogiLink PC0075, 2x RJ-45, PCIe 2.0 x1 (second NIC with two ports)
> >>
> >> The system is Linux Debian 10.4 (Proxmox 6.2-4) with kernel 5.4.34-1-pve
> >> #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200) x86_64 GNU/Linux.
> >>
> >> What I did so far (without success):
> >>
> >> - Disabled C6 as I read that this CPU-state can lead to unstable systems
> >> (via "python zenstates.py --c6-disable" -> still errors).
> >> - Updated my Bios to the latest version (3.30)
> >> - Checked that the CPU + RAM are compatible to the mainboard (they are
> >> listed as compatible on the ASRock website)
> >> - Checked logs in IPMI (undervoltage, temperature etc., nothing is logged)
> >> - Memory test (memtest86, no errors)
> >>
> >> Do you have any clue what could be the reason for these freezes? Should
> >> I think of some hardware error? Or is this some known Linux bug that can
> >> be fixed?
> >>
> >> Best Regards,
> >> Hermann
> >>
> >> -- 
> >> hermann at qwer.tk
> >> PGP/GPG: 299893C7 (on keyservers)
> >> _______________________________________________
> >> pve-user mailing list
> >> pve-user at lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> > 
> 
> -- 
> hermann at qwer.tk
> PGP/GPG: 299893C7 (on keyservers)




More information about the pve-user mailing list