Crash
調試核心恐慌 - 看門狗在 cpu 9 上檢測到硬 LOCKUP?
在新安裝的 SLES 11.4 上,我們從 /var/crash 中看到了這個 dmesg:
<7>[ 48.600847] storage: no IPv6 routers present <6>[ 63.725477] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found <6>[ 310.226578] [Hardware Error]: Machine check events logged <6>[ 3536.417543] lp: driver loaded but no devices found <6>[ 3536.417582] ppdev: user-space parallel port driver <6>[ 3536.983736] lp: driver loaded but no devices found <6>[ 3537.005660] Uniform Multi-Platform E-IDE driver <6>[ 3537.011756] ide-cd driver 5.00 <6>[ 3537.033960] st: Version 20101219, fixed bufsize 32768, s/g segs 256 <0>[ 3691.340041] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 9 <4>[ 3691.447069] Pid: 0, comm: kworker/0:1 Tainted: G X 3.0.101-107-default #1 <4>[ 3691.554690] Call Trace: <4>[ 3691.590254] [<ffffffff81004b35>] dump_trace+0x75/0x300 <4>[ 3691.664599] [<ffffffff81467873>] dump_stack+0x69/0x6f <4>[ 3691.738878] [<ffffffff8146792f>] panic+0xb6/0x224 <4>[ 3691.804367] [<ffffffff810c900c>] watchdog_overflow_callback+0xdc/0xe0 <4>[ 3691.896736] [<ffffffff810f55fa>] __perf_event_overflow+0xaa/0x230 <4>[ 3691.980294] [<ffffffff81018808>] intel_pmu_handle_irq+0x1a8/0x370 <4>[ 3692.069469] [<ffffffff8146c8f1>] perf_event_nmi_handler+0x31/0xa0 <4>[ 3692.156027] [<ffffffff8146ea47>] notifier_call_chain+0x37/0x70 <4>[ 3692.239630] [<ffffffff8146ea8d>] __atomic_notifier_call_chain+0xd/0x20 <4>[ 3692.334749] [<ffffffff8146eadd>] notify_die+0x2d/0x40 <4>[ 3692.409254] [<ffffffff8146c073>] default_do_nmi+0x33/0xc0 <4>[ 3692.489610] [<ffffffff8146c168>] do_nmi+0x68/0x80 <4>[ 3692.558033] [<ffffffff8146b595>] restart_nmi+0x1e/0x2e
重新安裝它以檢查它是硬體還是軟體問題,但是當我們執行 DSA 日誌時它仍然崩潰(開始時間約為 3500 秒正常執行時間)。
**問題:**從這個 dmesg(或任何其他資訊)中,我們可以確定導致崩潰的原因是什麼?cpu9錯誤?或驅動程序錯誤?
看起來升級 megaraid 韌體(還有一個 ram dimm 問題!)修復了 DSA 期間的崩潰。