修復 RAID5 陣列
我正在嘗試修復由 3 個 2TB 磁碟組成的 RAID5 陣列。在完美執行了一段時間後,電腦(執行 Debian)突然無法啟動,並卡在 GRUB 提示符下。我很確定它與 RAID 陣列有關。
由於很難完整說明已經嘗試過的所有內容,因此我將嘗試描述目前狀態。
mdadm --detail /dev/md0
輸出:/dev/md0: Version : 1.2 Creation Time : Sun Mar 22 15:13:25 2015 Raid Level : raid5 Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB) Raid Devices : 3 Total Devices : 2 Persistence : Superblock is persistent Update Time : Sun Mar 22 16:18:56 2015 State : active, degraded, Not Started Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : ubuntu:0 (local to host ubuntu) UUID : ae2b72c0:60444678:25797b77:3695130a Events : 57 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1
mdadm --examine /dev/sda1
給出:mdadm: No md superblock detected on /dev/sda1.
這是有道理的,因為我重新格式化了這個分區,因為我認為它是錯誤的。
mdadm --examine /dev/sdb1
給出:/dev/sdb1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ae2b72c0:60444678:25797b77:3695130a Name : ubuntu:0 (local to host ubuntu) Creation Time : Sun Mar 22 15:13:25 2015 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB) Array Size : 3906763776 (3725.78 GiB 4000.53 GB) Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : f1817af9:1d964693:774d5d63:bfa69e3d Update Time : Sun Mar 22 16:18:56 2015 Checksum : ab7c79ae - correct Events : 57 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : .AA ('A' == active, '.' == missing)
mdadm --detail /dev/sdc1
給出:/dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ae2b72c0:60444678:25797b77:3695130a Name : ubuntu:0 (local to host ubuntu) Creation Time : Sun Mar 22 15:13:25 2015 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 3906764800 (1862.89 GiB 2000.26 GB) Array Size : 3906763776 (3725.78 GiB 4000.53 GB) Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : f076b568:007e3f9b:71a19ea2:474e5fe9 Update Time : Sun Mar 22 16:18:56 2015 Checksum : db25214 - correct Events : 57 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : .AA ('A' == active, '.' == missing)
cat /proc/mdstat
:Personalities : [raid6] [raid5] [raid4] md0 : inactive sdb1[1] sdc1[2] 3906764800 blocks super 1.2 unused devices: <none>
fdisk -l
:Disk /dev/sda: 2000.4 GB, 2000398934016 bytes 81 heads, 63 sectors/track, 765633 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x000d84fa Device Boot Start End Blocks Id System /dev/sda1 2048 3907029167 1953513560 fd Linux raid autodetect Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x000802d9 Device Boot Start End Blocks Id System /dev/sdb1 * 2048 3907028991 1953513472 fd Linux raid autodetect Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x000a8dca Device Boot Start End Blocks Id System /dev/sdc1 2048 3907028991 1953513472 fd Linux raid autodetect Disk /dev/sdd: 7756 MB, 7756087296 bytes 255 heads, 63 sectors/track, 942 cylinders, total 15148608 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x128faec9 Device Boot Start End Blocks Id System /dev/sdd1 * 2048 15148607 7573280 c W95 FAT32 (LBA)
當然,我已經嘗試
/dev/sda1
再次添加。mdadm --manage /dev/md0 --add /dev/sda1
給出:mdadm: add new device failed for /dev/sda1 as 3: Invalid argument
如果 RAID 已修復,我可能還需要啟動 GRUB 並再次執行,以便它可以檢測 RAID/LVM 並再次啟動。
編輯(添加 smartctl 測試結果)
smartctl
測試的輸出
smartctl -a /dev/sda
:smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-30-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD20EZRX-00D8PB0 Serial Number: WD-WMC4M0760056 LU WWN Device Id: 5 0014ee 003a4a444 Firmware Version: 80.00A80 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Mar 24 22:07:08 2015 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (26280) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 266) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3401 3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4375 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 59 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9697 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 59 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51 193 Load_Cycle_Count 0x0032 115 115 000 Old_age Always - 255276 194 Temperature_Celsius 0x0022 119 106 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 12 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 9692 2057 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
您缺少
/dev/md0
RAID5 陣列的三個驅動器之一。因此,mdadm
將組裝陣列但不執行它。
-R
,--run
嘗試啟動陣列,即使提供的驅動器數量少於上次陣列處於活動狀態時的數量。通常,如果沒有找到所有預期的驅動器並且--scan
沒有使用,那麼陣列將被組裝但不會啟動。無論如何--run
都會嘗試啟動它。因此,您需要做的就是
mdadm --run /dev/md0
. 如果你很謹慎,你可以嘗試mdadm --run --readonly /dev/md0
按照它mount -o ro,norecover /dev/md0 /mnt
來檢查它看起來沒問題。(當然,相反的--readonly
是,--readwrite
。)執行後,您可以重新添加新磁碟。
我不建議添加您現有的磁碟,因為它出現 SMART 磁碟錯誤,這一點最近的測試報告證明了這一點
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 9692 2057
但是,如果您真的想嘗試重新添加現有磁碟,那麼
--zero-superblock
首先在該磁碟上可能是一個非常好的主意。但我仍然建議更換它。