Linux
“核心:設備上的緩衝區 I/O 錯誤” - 我的伺服器是否有硬體問題?
我們有 linux DB 伺服器 redhat 7.2
我們注意到關於已安裝的所有磁碟的許多消息如下
從
/var/log/messages
如果這種行為與硬體問題相關,我們需要了解什麼
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4980* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4981* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4982* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4983* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4984* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4985* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4986* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4987* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4988* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4989* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4990* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4991* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4992* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4993* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4994* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4995* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4996* Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4997*
我們也看到了這條消息
Mar 27 09:18:08 server_DB smartd[1734]: Monitoring 0 ATA and 26 SCSI devices Mar 27 09:18:08 server_DB ModemManager[1755]: <warn> Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin Mar 27 09:18:08 server_DB ModemManager[1755]: <warn> Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin Mar 27 09:18:08 server_DB ModemManager[1755]: <warn> Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin Mar 27 09:18:08 server_DB ModemManager[1755]: <warn> Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin Mar 27 09:18:08 server_DB ModemManager[1755]: <warn> Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin Mar 27 09:18:08 server_DB ModemManager[1755]: <warn> Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin
我也檢查了磁碟
smartctl -a -d megaraid,0 /dev/sdb smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.el7.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: ST600MM0238 Revision: BS04 User Capacity: 600,127,266,816 bytes [600 GB] Logical block size: 512 bytes Formatted with type 2 protection Logical block provisioning type unreported, LBPME=0, LBPRZ=0 Rotation Rate: 10000 rpm Form Factor: 2.5 inches Logical Unit id: 0x5000c500a0f28343 Serial number: W0M0LYD2 Device type: disk Transport protocol: SAS Local Time is: Wed Mar 27 10:51:30 2019 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Disabled or Not Supported === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 24 C Drive Trip Temperature: 60 C Manufactured in week 45 of year 2017 Specified cycle count over device lifetime: 10000 Accumulated start-stop cycles: 50 Specified load-unload count over device lifetime: 300000 Accumulated load-unload cycles: 177 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 412242328 Blocks received from initiator = 3213595579 Blocks read from cache and sent to initiator = 312462212 Number of read and write commands whose size <= segment size = 31915885 Number of read and write commands whose size > segment size = 0 Vendor (Seagate/Hitachi) factory information number of hours powered up = 3178.45 number of minutes until next internal SMART test = 12
寫入此
I/O error
消息是為了警告硬體錯誤sdb
。例如,它可能與磁碟或電纜有關。如果您有大量磁碟同時顯示錯誤,我認為磁碟本身的錯誤不太可能:-)。這可能是磁碟控制器中的錯誤。
如果您看到“緩衝區 I/O 錯誤”但沒有關於 ATA 或 SCSI 錯誤程式碼或一般重試嘗試的特定消息,則可能會給出一些提示。但我真的不知道:-)。
當然,軟體錯誤可能會導致任何消息:-)。
舉一個軟體錯誤的例子,雖然我知道這不是同一個錯誤:我看到一個核心錯誤,其中顯示“緩衝區 I/O 錯誤”,沒有任何關於 ATA 或 SCSI 的錯誤消息或重試嘗試。 Fedora 錯誤 1553979。
“緩衝區”部分只是意味著它發生在對可記憶體在頁面記憶體中的文件數據的請求期間。由於歷史原因,人們有時將這些請求稱為“緩衝 IO”。