了解機器檢查異常 (MCE)
在嘗試調試執行 Ubuntu 16.04 的新筆記型電腦(KabyLake 架構)的頻繁凍結時,我偶然發現了以下條目
kern.log
:kernel: [ 0.041634] mce: [Hardware Error]: Machine check events logged
從那時起,我已經安裝
mcelog
但不知道如何處理日誌。內容/var/log/mcelog
為:mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 3880018086 ADDR fef1cf00 TIME 1479298799 Wed Nov 16 13:19:59 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 43880018086 ADDR fef1ff00 TIME 1479298799 Wed Nov 16 13:19:59 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 3880018086 ADDR fef1cf00 TIME 1479321645 Wed Nov 16 19:40:45 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 43880018086 ADDR fef1ff00 TIME 1479321645 Wed Nov 16 19:40:45 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 43880000086 ADDR fef1db80 TIME 1479328438 Wed Nov 16 21:33:58 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 13880000086 ADDR fef1dc00 TIME 1479328438 Wed Nov 16 21:33:58 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 43880000086 ADDR fef1db80 TIME 1479333991 Wed Nov 16 23:06:31 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 13880000086 ADDR fef1dc00 TIME 1479333991 Wed Nov 16 23:06:31 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 43880000086 ADDR fef1db80 TIME 1479373350 Thu Nov 17 10:02:30 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 13880000086 ADDR fef1dc00 TIME 1479373350 Thu Nov 17 10:02:30 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 3880018086 ADDR fef1cf00 TIME 1479373810 Thu Nov 17 10:10:10 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee0000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 43880018086 ADDR fef1ff00 TIME 1479373810 Thu Nov 17 10:10:10 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee0000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 3880018086 ADDR fef1cf00 TIME 1479375712 Thu Nov 17 10:41:52 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 43880018086 ADDR fef1ff00 TIME 1479375712 Thu Nov 17 10:41:52 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 3880018086 ADDR fef1cf00 TIME 1479385932 Thu Nov 17 13:32:12 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 43880018086 ADDR fef1ff00 TIME 1479385932 Thu Nov 17 13:32:12 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 3880018086 ADDR fef1cf00 TIME 1479387666 Thu Nov 17 14:01:06 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 43880018086 ADDR fef1ff00 TIME 1479387666 Thu Nov 17 14:01:06 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 43880000086 ADDR fef1db80 TIME 1479456710 Fri Nov 18 09:11:50 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 13880000086 ADDR fef1dc00 TIME 1479456710 Fri Nov 18 09:11:50 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 0 CPU 0 BANK 6 MISC 43880000086 ADDR fef1db80 TIME 1479459374 Fri Nov 18 09:56:14 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142 mcelog: Family 6 Model 8e CPU: only decoding architectural errors Hardware event. This is not a software error. MCE 1 CPU 0 BANK 7 MISC 13880000086 ADDR fef1dc00 TIME 1479459374 Fri Nov 18 09:56:14 2016 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee2000000040110a MCGSTATUS 0 MCGCAP c08 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 142
一些觀察(如果其中任何一個錯誤,請糾正我):
- 幾乎所有錯誤似乎都發生在同一頁面上 (
ADDR fef1xxx
)- 只有銀行 6 和 7 似乎受到影響。
- 所有條目都包含“錯誤溢出”和“未糾正錯誤”。
mcelog常見問題解答提到“預期記憶體錯誤糾正率較低,不需要更換硬體或其他操作”。日誌條目包含片語“未糾正的錯誤”,這表明我實際上應該採取一些措施。
我的問題是:
- 這些錯誤是什麼意思,我應該擔心它們嗎?
- 這些硬體錯誤是否會導致整個系統凍結?
- 我應該讓製造商更換筆記型電腦(或元件)嗎?
- 我還應該採取其他措施嗎?
首先,我擔心我不能很好地回答你的問題。我還擁有一台戴爾 XPS 13 (9360) 並看到相同的 MCE 消息。由於這些原因,我正在與戴爾支持聯繫。他們更換了主機板,但沒有幫助。日誌中的相同消息。在某些時候,他們得出結論,這可能是誤報。不過,他們不知道是什麼原因造成的(mcelog/kernel/Intel 問題?)。與支持部門的通信仍在進行中。
<rant>
順便說一句,與戴爾支持人員交談是一種非常不愉快的經歷。他們似乎只建議“標準”解決方案,如重置韌體、執行自我健康測試等。我沒有與有一定技術洞察力的人交談的印象。</rant>
要添加更多詳細資訊,我在 Fedora 24 上看到了相同的問題,因此它似乎與 Ubuntu 無關。
關於你的問題:
這些錯誤是什麼意思,我應該擔心它們嗎?
我不知道。戴爾支持認為這些都是誤報。
這些硬體錯誤是否會導致整個系統凍結?
除了消息之外,我的系統工作正常。我猜凍結是一個不同的問題。
我應該讓製造商更換筆記型電腦(或元件)嗎?
更換主機板並沒有解決 MCE 問題。它可能會解決凍結問題,儘管這似乎已通過核心更新修復。
我還應該採取其他措施嗎?
如果您尚未與支持人員聯繫,請與他們聯繫。一旦他們看到它影響到更多的客戶,也許他們會想出一個真正的解決方案。