LoginSignup
4
6

More than 5 years have passed since last update.

【CentOS7.1】mdadmで組んだRAIDが崩れた際の対応

Last updated at Posted at 2016-03-24

環境

  • HDD:500GB * 4
  • CentOS7.1
  • インストール時にmdadmでRAID10構築

なにが起きたの?

CentOS7インストール後、起動したらなぜかこんな状態

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdd1[3] sdc1[2] sda1[0]
      945133568 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

md125 : active raid10 sdd2[3] sdc2[2] sda2[0]
      26703872 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md126 : active raid10 sdd3[3] sdc3[2] sda3[0]
      513024 blocks super 1.0 512K chunks 2 near-copies [4/3] [U_UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid10 sdd5[3] sdc5[2] sda5[0]
      4136960 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]

sdbのHDDが認識していない!?

とりあえずログを確認

[root@localhost ~]# cat /var/log/messages | grep sdb
Mar 23 12:04:20 localhost kernel: md/raid10:md124: Disk failure on sdb1, disabling device.#012md/raid10:md124: Operation continuing on 3 devices.
Mar 23 12:04:20 localhost kernel: md/raid10:md126: Disk failure on sdb3, disabling device.#012md/raid10:md126: Operation continuing on 3 devices.
Mar 23 12:04:20 localhost kernel: md/raid10:md127: Disk failure on sdb5, disabling device.#012md/raid10:md127: Operation continuing on 3 devices.

なんかFailしてる!?

物理的な状況を確認

[root@localhost ~]# smartctl -a /dev/sdb
---
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0003   177   174   021    Pre-fail  Always       -       6150
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       267
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   037   037   000    Old_age   Always       -       46253
 10 Spin_Retry_Count        0x0012   100   100   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       261
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       274
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       289
194 Temperature_Celsius     0x0022   108   096   000    Old_age   Always       -       42
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1
200 Multi_Zone_Error_Rate   0x0008   200   197   051    Old_age   Offline      -       0
---

物理的には大丈夫そう?

とりあえずくっつけ直してみた

[root@localhost ~]# mdadm -a /dev/md124 /dev/sdb1
mdadm: re-added /dev/sdb1

確認してみる

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
      945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

md125 : active raid10 sdd2[3] sdc2[2] sda2[0]
      26703872 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md126 : active raid10 sdd3[3] sdc3[2] sda3[0]
      513024 blocks super 1.0 512K chunks 2 near-copies [4/3] [U_UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active raid10 sdd5[3] sdc5[2] sda5[0]
      4136960 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]

unused devices: <none>

なんか大丈夫そうなので他のパーティションもくっつけてみる

[root@localhost ~]# mdadm -a /dev/md125 /dev/sdb2
mdadm: re-added /dev/sdb2
[root@localhost ~]# mdadm -a /dev/md126 /dev/sdb3
mdadm: re-added /dev/sdb3
[root@localhost ~]# mdadm -a /dev/md127 /dev/sdb5
mdadm: added /dev/sdb5
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
      945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

md125 : active raid10 sdb2[1] sdd2[3] sdc2[2] sda2[0]
      26703872 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      [=================>...]  recovery = 87.1% (11635712/13351936) finish=0.1min speed=207966K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

md126 : active raid10 sdb3[1] sdd3[3] sdc3[2] sda3[0]
      513024 blocks super 1.0 512K chunks 2 near-copies [4/3] [U_UU]
        resync=DELAYED
      bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active raid10 sdb5[4] sdd5[3] sdc5[2] sda5[0]
      4136960 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
        resync=DELAYED

unused devices: <none>

待つこと数分・・・

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
      945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

md125 : active raid10 sdb2[1] sdd2[3] sdc2[2] sda2[0]
      26703872 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md126 : active raid10 sdb3[1] sdd3[3] sdc3[2] sda3[0]
      513024 blocks super 1.0 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid10 sdb5[4] sdd5[3] sdc5[2] sda5[0]
      4136960 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

unused devices: <none>

直った-!

念のためチェックしてみる

[root@localhost ~]# echo check > /sys/block/md124/md/sync_action

進捗状況の確認

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
      945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      [=>...................]  check =  9.6% (91577856/945133568) finish=142.0min speed=100136K/sec
      bitmap: 0/8 pages [0KB], 65536KB chunk

md125 : active raid10 sdb2[1] sdd2[3] sdc2[2] sda2[0]
      26703872 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid10 sdb3[1] sdd3[3] sdc3[2] sda3[0]
      513024 blocks super 1.0 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid10 sdb5[4] sdd5[3] sdc5[2] sda5[0]
      4136960 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

unused devices: <none>

えらく時間がかかりそう・・・・
でもとりあえず大丈夫そうですね!

(2016/04/12追記)
全然大丈夫じゃなくてこの後普通にHDD壊れて交換しました笑
こちらに後日談記事を掲載しました。

参考情報

うまいぼうぶろぐ

4
6
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
6