Disk

ssd 不會掛載:壞超級塊但沒有壞塊:寫入錯誤

  • May 29, 2022

剛剛注意到我正在使用 SDD 作為 SSD。已更正

我需要幫助解釋這種情況。/dev/sda是備份的數據磁碟並具有可重現的數據,因此這不是系統關鍵,但我想避免恢復/重建數據的工作,其中一些將非常耗時

是否可以恢復/修復?

如果有怎麼辦?如果我擦除磁碟以重新使用它的可靠性是什麼?

摘要(詳細報告如下):

  • 不會安裝:壞超級塊
  • badblocks 沒有發現壞塊
  • smartctl 沒有報錯
  • fsck 無法設置超級塊標誌
  • fdisk 顯示乾淨的分區
  • dmesg 顯示寫入錯誤
  • parted 顯示 792 GB 可用 1 TB 驅動器

掛載 ssd 失敗,如下所示:

[stephen@meer ~]$ sudo mount /dev/sda1 /mnt/sda
mount: /mnt/sda: can't read superblock on /dev/sda1.
       dmesg(1) may have more information after failed mount system call.
[stephen@meer ~]$ 

但 badblocks 沒有發現壞塊

[root@meer stephen]# badblocks -v /dev/sda1              
Checking blocks 0 to 976760831
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

但是 smartctl 沒有發現錯誤

[root@meer stephen]# smartctl -a /dev/sda 
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.17.9-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue / Red / Green SSDs
Device Model:     WDC  WDS100T2B0A-00SM50
Serial Number:    213159800516
LU WWN Device Id: 5 001b44 8bc4fdc6e
Firmware Version: 415020WD
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Tue May 24 16:06:23 2022 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                   was never started.
                   Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                   without error or no self-test has ever 
                   been run.
Total time to complete Offline 
data collection:       (    0) seconds.
Offline data collection
capabilities:           (0x11) SMART execute Offline immediate.
                   No Auto Offline data collection support.
                   Suspend Offline collection upon new
                   command.
                   No Offline surface scan supported.
                   Self-test supported.
                   No Conveyance Self-test supported.
                   No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                   power-saving mode.
                   Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                   General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (   2) minutes.
Extended self-test routine
recommended polling time:   (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       124
  9 Power_On_Hours          0x0032   100   100   ---    Old_age   Always       -       1470
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       134
165 Block_Erase_Count       0x0032   100   100   ---    Old_age   Always       -       4312400063
166 Minimum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       1
167 Max_Bad_Blocks_per_Die  0x0032   100   100   ---    Old_age   Always       -       65
168 Maximum_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       14
169 Total_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       630
170 Grown_Bad_Blocks        0x0032   100   100   ---    Old_age   Always       -       124
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       128
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Average_PE_Cycles_TLC   0x0032   100   100   ---    Old_age   Always       -       2
174 Unexpected_Power_Loss   0x0032   100   100   ---    Old_age   Always       -       90
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       64
194 Temperature_Celsius     0x0022   070   053   ---    Old_age   Always       -       30 (Min/Max 18/53)
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator 0x0032   001   001   ---    Old_age   Always       -       0x002600140026
232 Available_Reservd_Space 0x0033   097   097   004    Pre-fail  Always       -       97
233 NAND_GB_Written_TLC     0x0032   100   100   ---    Old_age   Always       -       2703
234 NAND_GB_Written_SLC     0x0032   100   100   ---    Old_age   Always       -       2842
241 Host_Writes_GiB         0x0030   253   253   ---    Old_age   Offline      -       466
242 Host_Reads_GiB          0x0030   253   253   ---    Old_age   Offline      -       622
244 Temp_Throttle_Status    0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1470         -

Selective Self-tests/Logging not supported

並且 fsck 失敗了:

[root@meer ~]# e2fsck -cfpv /dev/sda1
/dev/sda1: recovering journal
e2fsck: Input/output error while recovering journal of /dev/sda1
e2fsck: unable to set superblock flags on /dev/sda1


/dev/sda1: ********** WARNING: Filesystem still has errors **********





May 24 15:38:29 meer kernel: I/O error, dev sda, sector 121899008 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 CDB: Write(10) 2a 00 07 44 08 00 00 00 08 00
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 Add. Sense: Unaligned write command
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 Sense Key : Illegal Request [current] 
May 24 15:38:29 meer kernel: sd 2:0:0:0: [sda] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
May 24 15:38:29 meer kernel: ata3.00: configured for UDMA/33
May 24 15:38:29 meer kernel: ata3.00: error: { ABRT }
May 24 15:38:29 meer kernel: ata3.00: status: { DRDY ERR }
May 24 15:38:29 meer kernel: ata3.00: cmd ca/00:08:00:08:44/00:00:00:00:00/e7 tag 31 dma 4096 out
                                      res 51/04:08:00:08:44/00:00:07:00:00/e7 Emask 0x1 (device error)
May 24 15:38:29 meer kernel: ata3.00: failed command: WRITE DMA
May 24 15:38:29 meer kernel: ata3.00: irq_stat 0x40000001
May 24 15:38:29 meer kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
May 24 15:38:29 meer kernel: ata3: EH complete
May 24 15:38:29 meer kernel: ata3.00: configured for UDMA/33
May 24 15:38:29 meer kernel: ata3.00: error: { ABRT }
May 24 15:38:29 meer kernel: ata3.00: status: { DRDY ERR }
May 24 15:38:29 meer kernel: ata3.00: cmd ca/00:08:00:08:44/00:00:00:00:00/e7 tag 6 dma 4096 out
                                      res 51/04:08:00:08:44/00:00:07:00:00/e7 Emask 0x1 (device error)
May 24 15:38:29 meer kernel: ata3.00: failed command: WRITE DMA
May 24 15:38:29 meer kernel: ata3.00: irq_stat 0x40000001
May 24 15:38:29 meer kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

fdisk 看到的分區。

Disk /dev/sda: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC  WDS100T2B0A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3F701164-2CF8-6D48-A94E-478634C140BE

Device     Start        End    Sectors   Size Type
/dev/sda1   2048 1953523711 1953521664 931.5G Linux filesystem

來自 dmesg

[ 5292.895300] ata3.00: configured for UDMA/33
[ 5292.895315] ata3: EH complete
[ 5293.021851] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[ 5293.021859] ata3.00: irq_stat 0x40000001
[ 5293.021864] ata3.00: failed command: WRITE DMA
[ 5293.021866] ata3.00: cmd ca/00:08:00:08:44/00:00:00:00:00/e7 tag 18 dma 4096 out
                        res 51/04:08:00:08:44/00:00:07:00:00/e7 Emask 0x1 (device error)
[ 5293.021874] ata3.00: status: { DRDY ERR }
[ 5293.021877] ata3.00: error: { ABRT }

分開:

root@meer stephen]# parted /dev/sda
GNU Parted 3.5
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print free                                                       
Model: ATA WDC WDS100T2B0A (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
        17.4kB  1049kB  1031kB  Free Space
 1      1049kB  1000GB  1000GB  ext4
        1000GB  1000GB  729kB   Free Space

我不知道你一直在用這個磁碟做什麼,但這是瘋狂的數字!查看 SSD 一直打開的輸出:

  • 1470 小時(61 天)
  • 執行 4312400063 (2.0GiB) 塊擦除
  • 163210068006 (76TiB) 媒體寫入。

在 61 天內,每秒寫入量恆定為 16MiB。

我想你有內部 NAND 故障。您可能無法取回您的數據。

我建議您今後最好的解決方案是使用某種形式的 raid 鏡像來緩衝多個磁碟之間的錯誤。

理想情況下,嘗試在多個磁碟之間分散錯誤和故障的分佈是兩個不同年齡和/或不同生產批次的磁碟。

澄清一下,我認為在很短的時間內異常大量的寫入。您需要將其考慮到您使用的儲存設置中。

引用自:https://unix.stackexchange.com/questions/703733