啟動失敗後如何閱讀這些日誌條目?
我的系統今天多次啟動失敗。在完全斷開電源後,我設法將其重新打開,幸運的是,但我想了解發生了什麼。
從使用者的角度來看,它是這樣的:我打開 PC,風扇比平時更吵,啟動過程有時會卡在啟動螢幕上,有時會在 GRUB 之後,有時會在很短的時間內出現顯示我的顯卡。
我查了日記,發現以下幾行以前沒有出現過。這是其中一隻失敗靴子的完整日記條目。其他條目看起來相似。
-- Boot 4c3326651829453c89c08358e88b8071 -- Jan 22 01:42:53 hostname kernel: Linux version 5.10.9-arch1-1 (linux@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000 Jan 22 01:42:53 hostname kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=d825234f-4397-494f-9c61-e719a008ecbd rw loglevel=3 quiet Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input15 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input16 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input17 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input18 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input19 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input20 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input21 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input22 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input23 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:14.2/sound/card0/input24 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line Out as /devices/pci0000:00/0000:00:14.2/sound/card0/input25 Jan 22 01:38:09 hostname audit[425]: NETFILTER_CFG table=filter family=10 entries=144 op=xt_replace pid=425 comm="ip6tables-resto" Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card0/input26 Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0). Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?) Jan 22 01:38:09 hostname audit[426]: NETFILTER_CFG table=filter family=10 entries=160 op=xt_replace pid=426 comm="ip6tables-resto" Jan 22 01:38:09 hostname systemd-udevd[258]: controlC0: Process '/usr/bin/alsactl restore 0' failed with exit code 99. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in First Boot Wizard being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in File System Check on Root Device being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Repartition Root Disk being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Create System Users being skipped. Jan 22 01:38:09 hostname audit[435]: NETFILTER_CFG table=filter family=10 entries=168 op=xt_replace pid=435 comm="ip6tables-resto" Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0). Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?) Jan 22 01:38:09 hostname systemd[1]: Finished CLI Netfilter Manager. Jan 22 01:38:09 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=ufw comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0). Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?) Jan 22 01:38:10 hostname kernel: nvidia: module license 'NVIDIA' taints kernel. Jan 22 01:38:10 hostname kernel: Disabling lock debugging due to kernel taint Jan 22 01:38:10 hostname kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237 Jan 22 01:38:10 hostname kernel: Jan 22 01:38:10 hostname kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem Jan 22 01:38:10 hostname kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 460.32.03 Sun Dec 27 19:00:34 UTC 2020 Jan 22 01:38:10 hostname systemd-udevd[267]: controlC1: Process '/usr/bin/alsactl restore 1' failed with exit code 99. Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in First Boot Wizard being skipped. Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in File System Check on Root Device being skipped. Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped. Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in Repartition Root Disk being skipped. Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in Create System Users being skipped. Jan 22 01:38:10 hostname kernel: random: crng init done Jan 22 01:38:10 hostname kernel: random: 7 urandom warning(s) missed due to ratelimiting Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Random Seed. Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jan 22 01:38:10 hostname systemd[1]: Condition check resulted in First Boot Complete being skipped. Jan 22 01:38:10 hostname kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 460.32.03 Sun Dec 27 18:51:11 UTC 2020 Jan 22 01:38:10 hostname kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver Jan 22 01:38:10 hostname kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0 Jan 22 01:38:10 hostname systemd[1]: Created slice system-systemd\x2dbacklight.slice. Jan 22 01:38:10 hostname systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:acpi_video0... Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:acpi_video0. Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-backlight@backlight:acpi_video0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termi> Jan 22 01:38:18 hostname kernel: ata2: softreset failed (1st FIS failed) Jan 22 01:38:28 hostname kernel: ata2: softreset failed (1st FIS failed)
注意時間戳。這些回到過去的跳躍發生了好幾次。這是正常行為還是已經表明存在問題?我了解到它們應該始終按照正確的時間順序排列。當我關閉並重新打開日記時,日記條目的排序是正確的,關於它們的時間戳。
這個失敗系列的第一個引導條目要長得多(可能是由於軟重置)。除了上述之外,我還注意到以下條目:
Jan 22 01:38:08 hostname kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20200925/tbfadt-615) ... Jan 22 01:38:08 hostname kernel: ata2.00: exception Emask 0x0 SAct 0x30000 SErr 0x0 action 0x6 Jan 22 01:38:08 hostname kernel: ata2.00: irq_stat 0x40000008 Jan 22 01:38:08 hostname kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 22 01:38:08 hostname kernel: ata2.00: cmd 60/78:80:88:00:00/00:00:00:00:00/40 tag 16 ncq dma 61440 in res 43/84:78:f0:00:00/00:00:00:00:00/00 Emask 0x410 (ATA bus error) <F> Jan 22 01:38:08 hostname kernel: ata2.00: status: { DRDY SENSE ERR } Jan 22 01:38:08 hostname kernel: ata2.00: error: { ICRC ABRT } Jan 22 01:38:08 hostname kernel: ata2: hard resetting link Jan 22 01:38:08 hostname kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported Jan 22 01:38:08 hostname kernel: ata2.00: configured for UDMA/133 Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Sense Key : Aborted Command [current] Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Add. Sense: Scsi parity error Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 00 00 88 00 00 78 00 Jan 22 01:38:08 hostname kernel: blk_update_request: I/O error, dev sdb, sector 136 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
我懷疑電源線是罪魁禍首,因為我記得在此之前我可以通過重新連接電源線來解決啟動問題。這裡有沒有人可以幫助我更好地理解這些日誌。
我對引用 BIOS 的條目特別好奇。在不了解情況的情況下,您可以從中提取什麼資訊?
您擁有現代多核處理器,並且您的發行版使用
systemd
. 結果,在啟動時,許多事情會並行發生,有時沒有固定的順序。如果某些日誌消息使用不同的路由(本機systemd
日誌記錄與核心的審計子系統與 syslog 系統呼叫),它們之間的順序可能會略有不同。我將按大部分順序瀏覽這些消息,但將一些相似的消息組合在一起。
Jan 22 01:42:53 hostname kernel: Linux version 5.10.9-arch1-1 (linux@archlinux) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #1 SMP PREEMPT Tue, 19 Jan 2021 22:06:06 +0000 Jan 22 01:42:53 hostname kernel: Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=d825234f-4397-494f-9c61-e719a008ecbd rw loglevel=3 quiet
這些通常是 Linux 核心在開始執行後立即輸出的第一行。此時,系統時鐘只是使用系統韌體將其初始化為的時間值,該時間值通常來源於電池供電的時鐘晶片。請注意
loglevel=3 quiet
核心選項:這些選項將使許多早期啟動消息靜音,因此在此之後和下一條消息之前可能會發生很多事情。Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input15 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input16 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input17 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input18 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input19 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input20 Jan 22 01:38:09 hostname kernel: input: HDA NVidia HDMI/DP,pcm=12 as /devices/pci0000:00/0000:00:02.1/0000:01:00.1/sound/card1/input21 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input22 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Rear Mic as /devices/pci0000:00/0000:00:14.2/sound/card0/input23 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line as /devices/pci0000:00/0000:00:14.2/sound/card0/input24 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Line Out as /devices/pci0000:00/0000:00:14.2/sound/card0/input25 Jan 22 01:38:09 hostname kernel: input: HD-Audio Generic Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card0/input26
這些消息來自將各種音頻連接器上的插頭檢測電路註冊為輸入的輸入子系統。這意味著此時已經載入了實際的聲音驅動程序。聲音晶片通常不被認為是“基本硬體”,因此它們的驅動程序通常不內置在核心中,甚至不總是包含在 initramfs 中,所以在這一點上,我們可能在引導過程中走得更遠。
引導過程可能已經啟動了網路介面並從 NTP 伺服器獲取了更準確的時間資訊,這可能解釋了明顯的時間向後跳轉。或者,如果沒有可用的電池支持時鐘並且還無法訪問 NTP 伺服器,則引導過程可能會將最後一次磁碟寫入操作的時間作為目前時間的最佳猜測。
Jan 22 01:38:09 hostname audit[425]: NETFILTER_CFG table=filter family=10 entries=144 op=xt_replace pid=425 comm="ip6tables-resto" Jan 22 01:38:09 hostname audit[426]: NETFILTER_CFG table=filter family=10 entries=160 op=xt_replace pid=426 comm="ip6tables-resto" Jan 22 01:38:09 hostname audit[435]: NETFILTER_CFG table=filter family=10 entries=168 op=xt_replace pid=435 comm="ip6tables-resto"
某些東西(可能
ip6tables-restore
)導致添加 IPv6 網路過濾器規則。Jan 22 01:38:09 hostname kernel: EDAC amd64: F15h detected (node 0). Jan 22 01:38:09 hostname kernel: EDAC amd64: Error: F1 not found: device 0x1601 (broken BIOS?)
這些消息來自 EDAC(錯誤檢測和糾正)子系統。它在僅具有 ECC 糾錯記憶體的系統(即伺服器和可能的高級工作站)上最有用。您的處理器似乎具有使用 ECC 記憶體所需的功能,但您的系統韌體顯然沒有 EDAC 子系統將記憶體/匯流排錯誤映射到物理組件(例如記憶體模組插槽)的必要資訊。也許您的主機板無法使用 ECC 記憶體?如果這是真的,那麼 EDAC 子系統可能對您沒有用處,您可以將 EDAC 模組列入黑名單以完全跳過載入它們。
Jan 22 01:38:09 hostname systemd-udevd[258]: controlC0: Process '/usr/bin/alsactl restore 0' failed with exit code 99. Jan 22 01:38:10 hostname systemd-udevd[267]: controlC1: Process '/usr/bin/alsactl restore 1' failed with exit code 99.
出於某種原因,主機板的集成聲音晶片和 GPU 的 HDMI/DP 連接器都無法恢復音效卡音量設置。也許是因為
alsactl store
從未用於持久儲存目前設置?僅當您使用原始 ALSA 和/或文本模式時,這可能很重要:大多數 GUI 桌面環境和/或 Pulseaudio 通常會通過使用者特定的設置覆蓋 ALSA 音量設置。Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in First Boot Wizard being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in File System Check on Root Device being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Repartition Root Disk being skipped. Jan 22 01:38:09 hostname systemd[1]: Condition check resulted in Create System Users being skipped.
systemd
正在跳過一些有條件的引導項,因為不滿足必要的條件。Jan 22 01:38:10 hostname kernel: nvidia: module license 'NVIDIA' taints kernel. Jan 22 01:38:10 hostname kernel: Disabling lock debugging due to kernel taint Jan 22 01:38:10 hostname kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237 Jan 22 01:38:10 hostname kernel: Jan 22 01:38:10 hostname kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem Jan 22 01:38:10 hostname kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 460.32.03 Sun Dec 27 19:00:34 UTC 2020
正在載入專有的 NVIDIA GPU 驅動程序。
Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-random-seed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
這看起來像是掛載了根文件系統,並且審計子系統確認了“真正的”systemd 啟動。在此之前,引導過程由 initramfs 中的“mini-systemd”處理。這也許可以解釋為什麼某些消息似乎重複了:initramfs mini-systemd 進行了一次這些檢查,而現在真正的 systemd 再次進行了檢查。
Jan 22 01:38:10 hostname kernel: random: crng init done Jan 22 01:38:10 hostname kernel: random: 7 urandom warning(s) missed due to ratelimiting Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Random Seed.
systemd
已恢復上次啟動或有序關閉的隨機種子。Jan 22 01:38:10 hostname kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 460.32.03 Sun Dec 27 18:51:11 UTC 2020 Jan 22 01:38:10 hostname kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver Jan 22 01:38:10 hostname kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
正在啟動專有 NVIDIA GPU 驅動程序的更多組件。
Jan 22 01:38:10 hostname systemd[1]: Created slice system-systemd\x2dbacklight.slice. Jan 22 01:38:10 hostname systemd[1]: Starting Load/Save Screen Backlight Brightness of backlight:acpi_video0... Jan 22 01:38:10 hostname systemd[1]: Finished Load/Save Screen Backlight Brightness of backlight:acpi_video0. Jan 22 01:38:10 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-backlight@backlight:acpi_video0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termi>
韌體 ACPI 表包括一個背光控制介面,並且
systemd
正在啟動一項服務,該服務會記住從一次引導到下一次引導的目前背光設置。Jan 22 01:38:18 hostname kernel: ata2: softreset failed (1st FIS failed) Jan 22 01:38:28 hostname kernel: ata2: softreset failed (1st FIS failed)
這表明第二個 SATA 設備存在問題。您的第二個片段似乎有更多關於此的資訊。
從第二個片段:
Jan 22 01:38:08 hostname kernel: ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20200925/tbfadt-615)
您的 ACPI 系統韌體的電源管理資訊不完整。BIOS 更新可能會解決這個問題,但由於它只是一個警告,核心可以解決這個問題。
Jan 22 01:38:08 hostname kernel: ata2.00: exception Emask 0x0 SAct 0x30000 SErr 0x0 action 0x6 Jan 22 01:38:08 hostname kernel: ata2.00: irq_stat 0x40000008 Jan 22 01:38:08 hostname kernel: ata2.00: failed command: READ FPDMA QUEUED Jan 22 01:38:08 hostname kernel: ata2.00: cmd 60/78:80:88:00:00/00:00:00:00:00/40 tag 16 ncq dma 61440 in res 43/84:78:f0:00:00/00:00:00:00:00/00 Emask 0x410 (ATA bus error) <F> Jan 22 01:38:08 hostname kernel: ata2.00: status: { DRDY SENSE ERR } Jan 22 01:38:08 hostname kernel: ata2.00: error: { ICRC ABRT } Jan 22 01:38:08 hostname kernel: ata2: hard resetting link Jan 22 01:38:08 hostname kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported Jan 22 01:38:08 hostname kernel: ata2.00: NCQ Send/Recv Log not supported Jan 22 01:38:08 hostname kernel: ata2.00: configured for UDMA/133 Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=1s Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Sense Key : Aborted Command [current] Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 Add. Sense: Scsi parity error Jan 22 01:38:08 hostname kernel: sd 1:0:0:0: [sdb] tag#16 CDB: Read(10) 28 00 00 00 00 88 00 00 78 00 Jan 22 01:38:08 hostname kernel: blk_update_request: I/O error, dev sdb, sector 136 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
這表明您的
/dev/sdb
磁碟出現數據傳輸錯誤。這可能是由於 SATA 電纜或連接器損壞造成的,因此您可以先嘗試將磁碟移動到其他 SATA 埠或先更換電纜。如果這些事情沒有幫助,則可能需要更換磁碟。如果故障似乎來來去去,請盡快對該磁碟進行額外備份。