Zfs
Linux 上的 ZFS - 設備故障後的意外行為
我維護一個帶有 ZFS 儲存池 (RAID Z3) 的 Debian 伺服器。最近 ZFS 報告了兩個磁碟同時發生故障:
ZFS has detected that a device was removed. impact: Fault tolerance of the pool may be compromised. eid: 138 class: statechange state: REMOVED host: serres-west-wing time: 2021-04-30 01:30:15+0300 vpath: /dev/disk/by-vdev/d0-part1 vguid: 0x6622AF6B1929E199 pool: 0x0964CF6A3748D7A9
ZFS has detected that a device was removed. impact: Fault tolerance of the pool may be compromised. eid: 140 class: statechange state: REMOVED host: serres-west-wing time: 2021-04-30 01:30:15+0300 vpath: /dev/disk/by-vdev/d1-part1 vguid: 0xD48BA6B066788199 pool: 0x0964CF6A3748D7A9
生成這些消息後,熱備用已啟動並立即開始重新同步。重新同步後池的狀態如下:
ZFS has finished a resilver: eid: 167 class: resilver_finish host: serres-west-wing time: 2021-04-30 02:15:03+0300 pool: datapool state: ONLINE scan: resilvered 132G in 00:44:41 with 0 errors on Fri Apr 30 02:15:03 2021 config: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 spare-0 ONLINE 0 0 0 d0-part1 ONLINE 0 0 0 hs-d0-part1 ONLINE 0 0 0 d1-part1 ONLINE 0 0 0 d2-part1 ONLINE 0 0 0 d3-part1 ONLINE 0 0 0 d4-part1 ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 zil-d0-part1 ONLINE 0 0 0 zil-d1-part1 ONLINE 0 0 0 cache l2arc-d0-part2 ONLINE 0 0 0 l2arc-d1-part2 ONLINE 0 0 0 spares hs-d0-part1 INUSE currently in use errors: No known data errors
磁碟似乎已連接並且工作正常
d0-part1
。d1-part1
這是由於與磁碟降級無關的因素導致的錯誤嗎?兩個工作磁碟似乎不太可能同時發生故障。停用熱備件是否安全?
似乎磁碟斷開是由電源問題引起的。為機器升級 UPS 後,我沒有遇到任何問題。我已停用熱備件已停用
zpool detach datapool hs-d0-part1
然後我重新銀化了游泳池
zpool scrud datapool
將池恢復到其原始狀態。