[PVE-User] Serious error with e1000e driver with Intel Corporation 82574L NIC

Dietmar Maurer dietmar at proxmox.com
Wed Sep 25 12:07:26 CEST 2013

maybe you can try with newer kernel: pve-kernel-2.6.32-24-pve_2.6.32-111_amd64.deb

I found that, with the recent pve kernel the network controller just crash and cannot back to life again without power off/on the system.

This occurs on a almost no-traffic interface after 1-2 days.

I found the following conversations on the net, exactly the same situation:


Background info:

  *   latest proxmox 3.1, fresh install, up to date
  *   no heavy traffic, almost nothing
  *   6 x gigabit LAN cards:
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Intel Corporation Device 0000
Flags: bus master, fast devsel, latency 0, IRQ 40
Memory at e8500000 (32-bit, non-prefetchable) [size=128K]
I/O ports at 3000 [size=32]
Memory at e8520000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at e8d00000 [disabled] [size=2K]
Capabilities: [c8] Power Management version 2
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [e0] Express Endpoint, MSI 00
Capabilities: [a0] MSI-X: Enable- Count=1 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 00-03-1d-ff-ff-0b-8a-e7
Kernel driver in use: e1000e
  *   ethtool -i eth1
driver: e1000e
version: 2.4.14-NAPI
firmware-version: 3.1-1
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
  *   ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:03:1d:0b:8a:e3
inet6 addr: fe80::203:1dff:fe0b:8ae3/64 Scope:Link
RX packets:70200 errors:354515190463890 dropped:59085865077315 overruns:0 frame:236343460309260
TX packets:13254 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:28083621 (26.7 MiB) TX bytes:2211516 (2.1 MiB)
Interrupt:17 Memory:e8900000-e8920000
  *   Kernel Command line:
BOOT_IMAGE=/vmlinuz-2.6.32-23-pve root=UUID=dd4c475c-b71d-4497-aeb1-c36a06a8c46f ro quiet
  *   dmesg report:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:267 dev_watchdog+0x28a/0x2a0() (Tainted: P --------------- )
Hardware name: HuronRiver Platform
NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
Modules linked in: fuse vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vhost_net tun macvtap macvlan kvm_intel kvm vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc ipv6 ext3 jbd snd_hda_codec_realtek i915 snd_pcsp snd_hda_intel snd_hda_codec snd_hwdep zfs(P) zunicode(P) snd_pcm snd_page_alloc drm_kms_helper zavl(P) zcommon(P) parport_pc i2c_i801 video snd_timer iTCO_wdt iTCO_vendor_support drm i2c_algo_bit serio_raw parport shpchp snd i2c_core output soundcore ext4 jbd2 mbcache znvpair(P) spl zlib_deflate sg e1000e ahci [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper veid: 0 Tainted: P --------------- 2.6.32-23-pve #1
Call Trace:
<IRQ> [<ffffffff8106f667>] ? warn_slowpath_common+0x87/0xe0
[<ffffffff8106f776>] ? warn_slowpath_fmt+0x46/0x50
[<ffffffff8148a08a>] ? dev_watchdog+0x28a/0x2a0
[<ffffffff8108065b>] ? internal_add_timer+0xcb/0x130
[<ffffffff81489e00>] ? dev_watchdog+0x0/0x2a0
[<ffffffff81083be6>] ? run_timer_softirq+0x176/0x370
[<ffffffff81033755>] ? native_apic_msr_write+0x35/0x40
[<ffffffff810793cb>] ? __do_softirq+0x11b/0x260
[<ffffffff810ac015>] ? tick_dev_program_event+0x65/0xc0
[<ffffffff810ac09a>] ? tick_program_event+0x2a/0x30
[<ffffffff8100c32c>] ? call_softirq+0x1c/0x30
[<ffffffff8100de95>] ? do_softirq+0x75/0xb0
[<ffffffff810796a5>] ? irq_exit+0xc5/0xd0
[<ffffffff81549a10>] ? smp_apic_timer_interrupt+0x70/0x9b
[<ffffffff8100bcd3>] ? apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff812db91b>] ? intel_idle+0xdb/0x160
[<ffffffff812db8f9>] ? intel_idle+0xb9/0x160
[<ffffffff814352a4>] ? cpuidle_idle_call+0x94/0x130
[<ffffffff81009219>] ? cpu_idle+0xa9/0x100
[<ffffffff8151bd21>] ? rest_init+0x85/0x94
[<ffffffff81c34cd6>] ? start_kernel+0x40b/0x417
[<ffffffff81c3433b>] ? x86_64_start_reservations+0x126/0x12a
[<ffffffff81c34436>] ? x86_64_start_kernel+0xf7/0x106
---[ end trace c5f8a6b8504af481 ]---
e1000e 0000:03:00.0: eth1: Reset adapter unexpectedly
e1000e 0000:03:00.0: eth1: Timesync Tx Control register not set as expected
e1000e 0000:03:00.0: eth1: Error reading PHY register
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
e1000e 0000:03:00.0: eth1: Reset adapter unexpectedly
e1000e 0000:03:00.0: eth1: Timesync Tx Control register not set as expected
vmbr1: port 1(eth1) entering disabled state

Any idea?

Thanks, István

