Linux

來自核心消息的 HDD IO 錯誤 + 這絕對是 HDD 故障嗎

  • January 3, 2022

在我們的 RHEL 伺服器上,RHEL 版本 - 7.2,我們看到許多 dmesg 行:

關於 sdb 磁碟(硬碟驅動器)的範例

[Thu Dec 30 13:07:48 2021] EXT4-fs (sdb): error count since last fsck: 1329
[Thu Dec 30 13:07:48 2021] EXT4-fs (sdb): initial error at time 1614482941: ext4_find_entry:1312: inode 67240512
[Thu Dec 30 13:07:48 2021] EXT4-fs (sdb): last error at time 1640670898: ext4_find_entry:1312: inode 67240512
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 Sense Key : Medium Error [current]
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[Thu Dec 30 13:12:19 2021] sd 0:0:1:0: [sdb] tag#0 CDB: Read(10) 28 00 80 41 13 38 00 00 08 00
[Thu Dec 30 13:12:19 2021] blk_update_request: critical medium error, dev sdb, sector 2151748408



[Thu Dec 30 13:14:38 2021] EXT4-fs warning (device sdb): __ext4_read_dirblock:902: error reading directory block (ino 67240512, block 0)
[Thu Dec 30 13:17:05 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:21:26 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 Sense Key : Medium Error [current]
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[Thu Dec 30 13:21:59 2021] sd 0:0:1:0: [sdb] tag#0 CDB: Read(10) 28 00 80 41 13 38 00 00 08 00
[Thu Dec 30 13:21:59 2021] blk_update_request: critical medium error, dev sdb, sector 2151748408
[Thu Dec 30 13:21:59 2021] EXT4-fs warning (device sdb): __ext4_read_dirblock:902: error reading directory block (ino 67240512, block 0)
[Thu Dec 30 13:25:32 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:27:19 2021] NOHZ: local_softirq_pending 08
[Thu Dec 30 13:29:14 2021] NOHZ: local_softirq_pending 08

問題是基於上述消息:

是不是 - 最有可能的原因是硬碟驅動器如果年老而死?

如果是,我們應該怎麼做 - 更換磁碟/s?

參考資料 - https://access.redhat.com/solutions/35465

“Dying of old age”意味著驅動器是舊的,我們無法從日誌中的資訊中確定。

但是我假設這是在專業環境中;如果是這樣,在我看來,任何磁碟介質錯誤都應該觸發磁碟更換。“嚴重介質錯誤”消息表明這一個磁碟錯誤,與磁碟和系統之間的故障(例如電纜故障)無關。您問題中的日誌僅顯示一個故障扇區,因此很可能是局部故障,但如果您依賴數據儲存,則不值得冒險。

如果只有一個(或幾個)故障扇區,您可以嘗試重新映射它們以繼續使用驅動器(暫時);例如,參見smartctl retest bad sector

引用自:https://unix.stackexchange.com/questions/684808