Hard-Disk
BTRFS 是否說我的硬碟已死?
我注意到我的 HP N54L 正在工作,並發現它
dmesg
正在報告這個:[ 81.945530] btrfs read error corrected: ino 1 off 16685977600 (dev /dev/sdb sector 2636776) [ 82.010023] btrfs read error corrected: ino 1 off 16637734912 (dev /dev/sdb sector 2589656) [ 85.927604] verify_parent_transid: 43 callbacks suppressed [ 85.927615] parent transid verify failed on 16956989440 wanted 13182 found 12799 [ 85.974600] parent transid verify failed on 16585043968 wanted 13145 found 12357 [ 89.903548] repair_io_failure: 26 callbacks suppressed [ 89.903560] btrfs read error corrected: ino 1 off 16875483136 (dev /dev/sdb sector 2821816) [ 115.951579] parent transid verify failed on 16963846144 wanted 13184 found 12802 [ 115.976830] btrfs read error corrected: ino 1 off 16963846144 (dev /dev/sdb sector 2908128) [ 115.988907] parent transid verify failed on 16978874368 wanted 13187 found 12815 [ 543.848294] btrfs: device fsid e8f8fc09-3aae-4fce-85ca-fcf7665b9f02 devid 2 transid 13199 /dev/sdb [ 1120.854825] verify_parent_transid: 5 callbacks suppressed [ 1120.854838] parent transid verify failed on 16956600320 wanted 13184 found 12799 [ 1120.891229] repair_io_failure: 6 callbacks suppressed [ 1120.891243] btrfs read error corrected: ino 1 off 16956600320 (dev /dev/sdb sector 2901016) [ 1124.851937] parent transid verify failed on 16977842176 wanted 13187 found 12814 [ 1124.885429] btrfs read error corrected: ino 1 off 16977842176 (dev /dev/sdb sector 2921768)
這是我的 BTRFS 設置。跨 4x3TB 硬碟的 RAID10:
$ sudo btrfs filesystem df /mnt/btrfs Data, RAID10: total=136.00GiB, used=134.70GiB System, RAID10: total=64.00MiB, used=20.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID10: total=1.00GiB, used=363.21MiB $ sudo btrfs filesystem show /mnt/btrfs Label: none uuid: <UUID> Total devices 4 FS bytes used 135.05GiB devid 1 size 2.73TiB used 68.54GiB path /dev/sda devid 2 size 2.73TiB used 68.53GiB path /dev/sdb devid 3 size 2.73TiB used 68.53GiB path /dev/sdc devid 4 size 2.73TiB used 68.53GiB path /dev/sdd
我注意到來自 BTRFS 的設備統計數據是……奇怪……:
$ sudo btrfs device stats /mnt/btrfs [/dev/sda].write_io_errs 0 [/dev/sda].read_io_errs 0 [/dev/sda].flush_io_errs 0 [/dev/sda].corruption_errs 0 [/dev/sda].generation_errs 0 [/dev/sdb].write_io_errs 207275 [/dev/sdb].read_io_errs 127287 [/dev/sdb].flush_io_errs 0 [/dev/sdb].corruption_errs 0 [/dev/sdb].generation_errs 0 [/dev/sdc].write_io_errs 0 [/dev/sdc].read_io_errs 0 [/dev/sdc].flush_io_errs 0 [/dev/sdc].corruption_errs 0 [/dev/sdc].generation_errs 0 [/dev/sdd].write_io_errs 0 [/dev/sdd].read_io_errs 0 [/dev/sdd].flush_io_errs 0 [/dev/sdd].corruption_errs 0 [/dev/sdd].generation_errs 0
我已經為自己訂購了一個備用的 3TB 硬碟以防萬一,但我可以安全地假設它
/dev/sdb
已經死了嗎?我只是覺得 BTRFS 報告有點奇怪[/dev/sdb].corruption_errs 0
。是否有一種普遍接受的方法來證明硬碟在 BTRFS RAID 陣列中已失效?
我在家裡的伺服器上看到了類似的性能下降(執行 RAID-6,Btrfs 位於頂部)。它已在三個場合證明了其中一個驅動器。
我做的第一件事是
smartctl
為每個驅動器執行。然後對於失敗的驅動器,我注意到原始錯誤的數量:smartctl -x /dev/sdf | fgrep Raw
跟踪這些。我有一個驅動器曾經出現過一些錯誤,但在重置佈線後的過去 9 個月內一直很穩定。不知道為什麼,但我確實認為那個“還沒有死”。
如果錯誤計數再次增加,我會移除驅動器並進行更換(我可以忍受 RAID-6 中兩個額外驅動器之一離線半天的風險)。