Hard-Disk

製造商的工具發現了壞塊,但 smartctl 沒有顯示任何

  • April 29, 2015

我的問題描述的比較大,所以我先做一個簡短的總結,然後我將準確描述情況。

簡短摘要:製造商的診斷工具在我的硬碟上發現並修復了一些錯誤。據我了解工具手冊,這些錯誤是壞塊。但是,smartctl(在硬碟上執行 SMART 的 Linux 工具)沒有顯示任何重新分配的扇區,並說硬碟是好的。第一個問題:怎麼可能?修復壞塊意味著重新分配扇區,對嗎?那麼為什麼 smartctl 不報告任何重新分配的扇區呢?第二個問題:我幾個月前買了這個磁碟,我仍然有保修。我應該要求賣家更換一個新的還是這個磁碟是好的,我可以繼續使用它?

現在精確描述:

我有西部數據硬碟,型號 WDC WD5000AAKX-001CA0。最近我注意到有時我的電腦會掛起幾秒鐘(大約一分鐘)。掛起後 dmesg 顯示如下錯誤:

knoppix@Microknoppix:~$ dmesg
(...)
[  504.003363] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  504.003374] ata1.00: failed command: READ DMA EXT
[  504.003383] ata1.00: cmd 25/00:00:80:07:01/00:02:00:00:00/e0 tag 0 dma 262144 in
[  504.003385]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  504.003389] ata1.00: status: { DRDY }
[  509.016652] ata1: link is slow to respond, please be patient (ready=0)
[  514.030002] ata1: soft resetting link
[  514.200386] ata1.00: configured for UDMA/133
[  514.200420] ata1: EH complete
[  546.003333] ata1: lost interrupt (Status 0x50)
[  546.003364] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  546.003371] ata1.00: failed command: READ DMA EXT
[  546.003380] ata1.00: cmd 25/00:00:80:15:06/00:02:00:00:00/e0 tag 0 dma 262144 in
[  546.003381]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  546.003386] ata1.00: status: { DRDY }
[  546.003401] ata1: soft resetting link
[  546.181205] ata1.00: configured for UDMA/133
[  546.181234] ata1: EH complete

但是,smartctl 說“SMART 整體健康自我評估測試結果:通過”(我將在幾段後粘貼 smartctl 的完整輸出)。每當我嘗試進行 smartctl 自我測試(使用 smartctl -t short 或 smartctl -t long)時,此類測試都會報告為被主機中止。所以我為我的高畫質下載了可啟動 CD 診斷工具 - 這個: http: //support.wdc.com/product/download.asp ?groupid=606&sid=2&lang=en

首先使用這個工具我做了快速測試,它顯示錯誤(不幸的是,我不記得錯誤程式碼是什麼)。據我了解,此工具僅執行 SMART 快速自檢(http://wdc.custhelp.com/app/answers/detail/search/1/a_id/940/c/130/p/227,295 表示“快速測試 -執行 SMART 驅動器快速自檢以收集和驗證驅動器上包含的 Data Lifeguard 資訊。”)然後我進行了擴展測試。據我了解,此擴展測試查找壞扇區(http://wdc.custhelp.com/app/answers/detail/search/1/a_id/940/c/130/p/227,295 表示“擴展測試 -執行全媒體掃描以檢測壞扇區”)。一段時間後,該工具告訴它發現並修復了一些錯誤。

現在我用 knoppix 啟動機器並執行“smartctl –all”。這是它的輸出:

root@Microknoppix:/home/knoppix# smartctl --all /dev/sda
smartctl 5.43 2012-06-05 r3561 [i686-linux-3.4.9] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD5000AAKX-001CA0
Serial Number:    WD-WMAYUW952768
LU WWN Device Id: 5 0014ee 6ad1d9ef1
Firmware Version: 15.01H15
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Dec 12 03:34:39 2012 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                   was completed without error.
                   Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                   without error or no self-test has ever 
                   been run.
Total time to complete Offline 
data collection:        ( 8160) seconds.
Offline data collection
capabilities:            (0x7b) SMART execute Offline immediate.
                   Auto Offline data collection on/off support.
                   Suspend Offline collection upon new
                   command.
                   Offline surface scan supported.
                   Self-test supported.
                   Conveyance Self-test supported.
                   Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                   power-saving mode.
                   Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                   General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    (  83) minutes.
Conveyance self-test routine
recommended polling time:    (   5) minutes.
SCT capabilities:          (0x3037) SCT Status supported.
                   SCT Feature Control supported.
                   SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       486
 3 Spin_Up_Time            0x0027   189   141   021    Pre-fail  Always       -       1525
 4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       587
 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
 7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
 9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1553
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       578
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       173
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       413
194 Temperature_Celsius     0x0022   097   093   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       5
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       5

SMART Error Log Version: 1
ATA Error Count: 2
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 1548 hours (64 days + 12 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 04 51 01 30 4f c2 a0  Error: ABRT

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 b0 d6 01 be 4f c2 a0 02      00:02:58.316  SMART WRITE LOG
 b0 da 01 00 4f c2 a0 02      00:02:58.259  SMART RETURN STATUS
 80 44 00 00 44 57 a0 02      00:02:58.259  [VENDOR SPECIFIC]
 b0 d6 01 be 4f c2 a0 02      00:02:58.241  SMART WRITE LOG
 80 45 00 01 44 57 a0 02      00:02:58.241  [VENDOR SPECIFIC]

Error 1 occurred at disk power-on lifetime: 1515 hours (63 days + 3 hours)
 When the command that caused the error occurred, the device was active or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 04 51 01 30 4f c2 a0  Error: ABRT

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 b0 d6 01 be 4f c2 a0 02      00:02:21.841  SMART WRITE LOG
 b0 da 01 00 4f c2 a0 02      00:02:21.784  SMART RETURN STATUS
 80 44 00 00 44 57 a0 02      00:02:21.784  [VENDOR SPECIFIC]
 b0 d6 01 be 4f c2 a0 02      00:02:21.768  SMART WRITE LOG
 80 45 00 01 44 57 a0 02      00:02:21.768  [VENDOR SPECIFIC]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed without error       00%      1552         -
# 2  Conveyance offline  Completed: read failure       90%      1548         787927349
# 3  Conveyance offline  Completed: read failure       90%      1515         883391611
# 4  Short offline       Completed without error       00%      1503         -
# 5  Short offline       Completed without error       00%      1503         -
# 6  Short offline       Aborted by host               80%      1502         -
# 7  Extended offline    Completed without error       00%         9         -
# 8  Short offline       Completed without error       00%         6         -
# 9  Short offline       Aborted by host               90%         6         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1        0        0  Not_testing
   2        0        0  Not_testing
   3        0        0  Not_testing
   4        0        0  Not_testing
   5        0        0  Not_testing
Selective self-test flags (0x0):
 After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

如您所見,一方面離線傳輸完成但讀取失敗。但是,另一方面,所有屬性似乎都不錯——例如,Reallocated_Sector_Ct 為 0。

我還再次嘗試將整個磁碟歸類到 /dev/null - 我在 dmesg 中再次出現錯誤:

root@Microknoppix:/home/knoppix# nice -n 20 ionice -c 3 cat /dev/sda > /dev/null
During this cat dmesg shows such errors:
knoppix@Microknoppix:~$ dmesg
(...)
[  504.003363] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  504.003374] ata1.00: failed command: READ DMA EXT
[  504.003383] ata1.00: cmd 25/00:00:80:07:01/00:02:00:00:00/e0 tag 0 dma 262144 in
[  504.003385]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  504.003389] ata1.00: status: { DRDY }
[  509.016652] ata1: link is slow to respond, please be patient (ready=0)
[  514.030002] ata1: soft resetting link
[  514.200386] ata1.00: configured for UDMA/133
[  514.200420] ata1: EH complete
[  546.003333] ata1: lost interrupt (Status 0x50)
[  546.003364] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  546.003371] ata1.00: failed command: READ DMA EXT
[  546.003380] ata1.00: cmd 25/00:00:80:15:06/00:02:00:00:00/e0 tag 0 dma 262144 in
[  546.003381]          res 40/00:00:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[  546.003386] ata1.00: status: { DRDY }
[  546.003401] ata1: soft resetting link
[  546.181205] ata1.00: configured for UDMA/133
[  546.181234] ata1: EH complete

我認為這可能是主機板或將磁碟連接到主機板的數據線的故障。因此,我使用相同的電纜和插槽將另一個磁碟連接到我的主機板,並將其連接到 /dev/null。它成功了,沒有 dmesg 顯示任何錯誤。

沒有重新分配的扇區,因為它們未能重新分配。您的驅動器顯示 5 個 Offline_Uncorrectable 扇區,當自動修復失敗時會發生這種情況。dmesg 輸出中顯示了明顯的讀取失敗、SMART 錯誤以及 SMART 測試中的讀取失敗。正如您在問題中提到的,有一些修復這些扇區的方法,但根據我的經驗,這是一個非常短期的修復。

更換驅動器。

引用自:https://unix.stackexchange.com/questions/58255