PVE 6.2 e1000e driver hang and node fence

Eneko Lacunza elacunza at binovo.es
Thu Jul 23 13:14:25 CEST 2020


Hi all,

In a recently (8 days ago) updated PVE 6.2 node, e1000e driver has 
hanged and node has been fenced and rebooted. Syslog had several 
instances of the following:


Jul 23 13:02:21 proxmox2 kernel: [694027.049891] e1000e 0000:00:1f.6 
enp0s31f6: Detected Hardware Unit Hang:
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] TDH                  <0>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] TDT                  <1>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] next_to_use          <1>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] next_to_clean        <0>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] buffer_info[next_to_clean]:
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] time_stamp           
<10a5668a3>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] next_to_watch        <0>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] jiffies              
<10a566a38>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] next_to_watch.status <0>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] MAC Status             
<80083>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] PHY Status             
<796d>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] PHY 1000BASE-T Status  
<3800>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] PHY Extended Status    
<3000>
Jul 23 13:02:21 proxmox2 kernel: [694027.049891] PCI Status             <10>


root at proxmox2:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-1
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-11
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-10
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

Has anyone experienced this? Cluster has 3 almost identical nodes and 
only this has been affected for now...

Thanks
Eneko

-- 
Eneko Lacunza                   | Tel.  943 569 206
                                 | Email elacunza at binovo.es
Director Técnico                | Site. https://www.binovo.es
BINOVO IT HUMAN PROJECT S.L     | Dir.  Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun




More information about the pve-user mailing list