[PVE-User] Dell R350, Proxmox VE 8.2.2, sas-megaraid error and system hang

Andrea Casati casati at kona.it
Fri Aug 9 13:30:22 CEST 2024


Hello

Dell R350 with PERC H755.
Tried with kernel 6.8.4, 6.8.8 and 6.5.13.
System hangs (need to phisically power off/on the machine) every day 
during compressed backup, and sometimes during normal usage of VM.

Log with kernel 6.8.4:
*Jul 15 19:04:45 r350ve kernel: megaraid_sas 0000:01:00.0: Adapter is 
OPERATIONAL for scsi:0
Jul 15 19:04:45 r350ve kernel: megaraid_sas 0000:01:00.0: Snap dump wait 
time    : 15
Jul 15 19:04:45 r350ve kernel: megaraid_sas 0000:01:00.0: Reset 
successful for scsi0.
Jul 15 19:04:45 r350ve kernel: megaraid_sas 0000:01:00.0: 3296 
(774378251s/0x0020/DEAD) - Fatal firmware error: Line 188 in fw\raid\utils.c
Jul 15 19:04:45 r350ve kernel: megaraid_sas 0000:01:00.0: 3300 (boot + 
5s/0x0020/CRIT) - Controller encountered an error and was reset*

Errors on console with kernel 6.5.13:
*kvm_intel: kvm [2225]: vcpu0, guest rIP: 0xfffff80277d68f93 Unhandled 
WRMSR(0x1d9) = 0x1*
*megaraid_sas 0000:01:00.0: FW in FAULT state Fault code:0x10000 
subcode:0x0 func:megasas_wait_for_outstanding_fusion*


IDRAC reports no errors - Dell support reports no problems.

Have anyone seen something like this before?


Thank you.




More information about the pve-user mailing list