Debian

驅動程序/net/ethernet/intel/e1000e/netdev.c:3804 中的核心錯誤!

  • January 26, 2021

我們的伺服器幾乎每天都在高峰時段流量較高時開始崩潰,系統日誌總是被一些 eth0 重置垃圾郵件,然後網路完全崩潰,必須重新啟動機器才能遠端訪問機器再次。

此錯誤是否意味著 NIC 卡已死或只是軟體問題?

執行核心:4.19.0-10-amd64 作業系統:Debian 10

Jan 25 18:00:41 Debian-83-jessie-64-minimal kernel: [161879.702795] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:00:45 Debian-83-jessie-64-minimal kernel: [161883.545928] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:04:41 Debian-83-jessie-64-minimal kernel: [162119.835193] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:04:45 Debian-83-jessie-64-minimal kernel: [162123.214074] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:05:50 Debian-83-jessie-64-minimal kernel: [162188.695254] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:05:54 Debian-83-jessie-64-minimal kernel: [162192.610229] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:06:14 Debian-83-jessie-64-minimal kernel: [162212.759251] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:06:18 Debian-83-jessie-64-minimal kernel: [162216.990139] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:07:27 Debian-83-jessie-64-minimal kernel: [162285.975361] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:07:31 Debian-83-jessie-64-minimal kernel: [162289.814340] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:07:47 Debian-83-jessie-64-minimal kernel: [162305.687558] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:07:51 Debian-83-jessie-64-minimal kernel: [162309.506389] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:07:59 Debian-83-jessie-64-minimal systemd[1]: session-247.scope: Succeeded.
   Jan 25 18:08:48 Debian-83-jessie-64-minimal kernel: [162366.871583] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:08:52 Debian-83-jessie-64-minimal kernel: [162370.734613] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:09:01 Debian-83-jessie-64-minimal CRON[27975]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
   Jan 25 18:09:01 Debian-83-jessie-64-minimal CRON[27974]: (root) CMD (  [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
   Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: Starting Clean php session files...
   Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: phpsessionclean.service: Succeeded.
   Jan 25 18:09:01 Debian-83-jessie-64-minimal systemd[1]: Started Clean php session files.
   Jan 25 18:09:42 Debian-83-jessie-64-minimal kernel: [162420.891568] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:09:46 Debian-83-jessie-64-minimal kernel: [162424.734698] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:10:57 Debian-83-jessie-64-minimal kernel: [162495.895693] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:11:01 Debian-83-jessie-64-minimal kernel: [162499.750608] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.895786] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.915877] ------------[ cut here ]------------
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.915964] kernel BUG at drivers/net/ethernet/intel/e1000e/netdev.c:3804!
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916048] invalid opcode: 0000 [#1] SMP PTI
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916126] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G        W         4.19.0-10-amd64 #1 Debian 4.19.132-1
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916222] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916328] Workqueue: events e1000_reset_task [e1000e]
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916410] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916486] Code: ff ff 31 c0 31 ed 66 41 89 45 20 e9 a8 fe ff ff 4c 89 e7 e8 89 f3 ff ff e9 af fe ff ff 4c 89 e7 e8 7c f3 ff ff e9 30 fe ff ff <0f> 0b 4c 89 e7 e8 6d f3 ff ff eb ac 4c 89 e7 e8 63 f3 ff ff e9 68
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916615] RSP: 0018:ffffaf708629fde0 EFLAGS: 00010202
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916689] RAX: 0000000000000067 RBX: ffff9043211f48c0 RCX: 000000000000007d
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916780] RDX: 0000000000000067 RSI: 0000000000000246 RDI: 0000000000000246
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916872] RBP: 000000003103f0fa R08: 0000000000000002 R09: ffffaf708629fdc4
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.916963] R10: 00000000000000fe R11: 0000000000000000 R12: ffff9043211f4e38
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917055] R13: ffff90432a33f800 R14: 0000000004008000 R15: ffff9043211f4940
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917147] FS:  0000000000000000(0000) GS:ffff904331200000(0000) knlGS:0000000000000000
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917315] CR2: 00007f31849487f8 CR3: 00000005a660a003 CR4: 00000000003606f0
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917406] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917497] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917588] Call Trace:
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917663]  e1000e_reset+0x574/0x790 [e1000e]
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917743]  e1000e_down+0x1cf/0x200 [e1000e]
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917819]  e1000e_reinit_locked+0x46/0x60 [e1000e]
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917898]  process_one_work+0x1a7/0x3a0
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.917974]  worker_thread+0x30/0x390
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918046]  ? create_worker+0x1a0/0x1a0
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918118]  kthread+0x112/0x130
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918188]  ? kthread_bind+0x30/0x30
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918260]  ret_from_fork+0x35/0x40
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918331] Modules linked in: unix_diag ip6t_rpfilter ipt_rpfilter binfmt_misc veth ip6t_MASQUERADE ipt_MASQUERADE xt_CHECKSUM xt_comment xt_tcpudp bridge stp llc dm_mod ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat nf_nat_ipv6 ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nf_tables nfnetlink cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul evdev crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore squashfs iTCO_wdt pcc_cpufreq sg iTCO_vendor_support intel_pch_thermal intel_rapl_perf fujitsu_laptop wmi loop sparse_keymap video acpi_pad button ip_tables x_tables autofs4
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918698]  ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 md_mod sd_mod crc32c_intel ahci xhci_pci libahci xhci_hcd libata aesni_intel e1000e usbcore scsi_mod aes_x86_64 crypto_simd cryptd glue_helper i2c_i801 usb_common thermal fan
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918920] ---[ end trace fc8f12793b39335d ]---
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.918998] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919078] Code: ff ff 31 c0 31 ed 66 41 89 45 20 e9 a8 fe ff ff 4c 89 e7 e8 89 f3 ff ff e9 af fe ff ff 4c 89 e7 e8 7c f3 ff ff e9 30 fe ff ff <0f> 0b 4c 89 e7 e8 6d f3 ff ff eb ac 4c 89 e7 e8 63 f3 ff ff e9 68
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919206] RSP: 0018:ffffaf708629fde0 EFLAGS: 00010202
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919281] RAX: 0000000000000067 RBX: ffff9043211f48c0 RCX: 000000000000007d
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919372] RDX: 0000000000000067 RSI: 0000000000000246 RDI: 0000000000000246
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919464] RBP: 000000003103f0fa R08: 0000000000000002 R09: ffffaf708629fdc4
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919555] R10: 00000000000000fe R11: 0000000000000000 R12: ffff9043211f4e38
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919647] R13: ffff90432a33f800 R14: 0000000004008000 R15: ffff9043211f4940
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919739] FS:  0000000000000000(0000) GS:ffff904331200000(0000) knlGS:0000000000000000
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.919937] CR2: 00007f31849487f8 CR3: 00000005a660a003 CR4: 00000000003606f0
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.920030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   Jan 25 18:12:01 Debian-83-jessie-64-minimal kernel: [162559.920123] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel BUG at drivers/net/ethernet/intel/e1000e/netdev.c:3804!
kernel: [162559.916048] invalid opcode: 0000 [#1] SMP PTI
kernel: [162559.916126] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G        W         4.19.0-10-amd64 #1 Debian 4.19.132-1
kernel: [162559.916222] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
kernel: [162559.916328] Workqueue: events e1000_reset_task [e1000e]
kernel: [162559.916410] RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]

您正在執行 Debian 的發行版核心,它在上游原始碼之上應用了一些更新檔,因此我的快速分析可能並不完全準確。但是查看4.19.170 上游原始碼中的第 3804 行,drivers/net/ethernet/intel/e1000e/netdev.c我們會看到這一行:

BUG_ON(tdt != tx_ring->next_to_use);

如果指定的條件為真,這將觸髮kernel BUG at...帶有堆棧跟踪和所有內容的消息。

該行在 functione1000_flush_tx_ring()中,由 function 呼叫,而 functione1000_flush_desc_rings()又被稱為錯誤消息中的指令指針位置:

RIP: 0010:e1000_flush_desc_rings+0x2a9/0x2f0 [e1000e]

也許編譯器已經內聯或以其他方式優化了該e1000_flush_tx_ring()函式,使其不明顯作為該RIP:行的可辨識符號。但它似乎匹配:呼叫跟踪強烈表明驅動程序正在重置 NIC 的過程中,並且刷新 TX 環顯然是該過程的一部分。

但是什麼使得重置是必要的?事實證明,英特爾已經發布了 I218/I219 NIC 的規範更新

5.I219 處理 DMA 事務時緩衝區溢出

問題:英特爾® 100/200 系列晶片組平台減少了 LAN 控制器 DMA 訪問的往返延遲,在某些高性能情況下,當 I219 LAN 連接設備處理 DMA 事務時,會導致緩衝區溢出。

含義:I219LM 和 I219V 設備在非常緊張的 UDP 流量和乙太網電纜的多次重新連接下可能陷入無法恢復的 Tx 掛起。LAN 控制器的此 Tx 掛起僅在系統重新啟動後才能恢復。

解決方法:通過減少未完成請求的數量來稍微減慢 DMA 訪問速度。此解決方法可能會對 TCP 流量性能產生影響,並且可能會降低 5% 到 15%(取決於平台)的性能。禁用 TSO 可消除 TCP 流量的性能損失,而不會顯著影響 CPU 性能。

狀態:英特爾® 100/200 系列晶片組 – NoFix

英特爾® 300 系列晶片組 - 已修復

所以根本原因似乎是硬體(或可能是 NIC 韌體)錯誤。驅動程序發現 TX 環形緩衝區的結構已損壞,並假設原因是驅動程序中的故障。但在這種情況下,故障似乎出在 NIC 本身。

tso建議的解決方法是禁用NIC的 TCP 分段解除安裝功能 ( ):

ethtool -K eth0 tso off

富士通 D3401-H1 似乎有一個英特爾酷睿 i7-6700 處理器,它屬於 Skylake 一代……所以我預計英特爾 100 系列晶片組也會隨之而來。看起來該晶片組沒有可用的修復程序,因此您可能需要應用解決方法。

引用自:https://unix.stackexchange.com/questions/630941