環境
- HDD:500GB * 4
- CentOS7.1
- インストール時にmdadmでRAID10構築
なにが起きたの?
CentOS7インストール後、起動したらなぜかこんな状態
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdd1[3] sdc1[2] sda1[0]
945133568 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
bitmap: 0/8 pages [0KB], 65536KB chunk
md125 : active raid10 sdd2[3] sdc2[2] sda2[0]
26703872 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md126 : active raid10 sdd3[3] sdc3[2] sda3[0]
513024 blocks super 1.0 512K chunks 2 near-copies [4/3] [U_UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid10 sdd5[3] sdc5[2] sda5[0]
4136960 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
sdbのHDDが認識していない!?
とりあえずログを確認
[root@localhost ~]# cat /var/log/messages | grep sdb
Mar 23 12:04:20 localhost kernel: md/raid10:md124: Disk failure on sdb1, disabling device.#012md/raid10:md124: Operation continuing on 3 devices.
Mar 23 12:04:20 localhost kernel: md/raid10:md126: Disk failure on sdb3, disabling device.#012md/raid10:md126: Operation continuing on 3 devices.
Mar 23 12:04:20 localhost kernel: md/raid10:md127: Disk failure on sdb5, disabling device.#012md/raid10:md127: Operation continuing on 3 devices.
なんかFailしてる!?
物理的な状況を確認
[root@localhost ~]# smartctl -a /dev/sdb
---
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 1
3 Spin_Up_Time 0x0003 177 174 021 Pre-fail Always - 6150
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 267
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 037 037 000 Old_age Always - 46253
10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 261
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 274
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 289
194 Temperature_Celsius 0x0022 108 096 000 Old_age Always - 42
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0008 200 197 051 Old_age Offline - 0
---
物理的には大丈夫そう?
とりあえずくっつけ直してみた
[root@localhost ~]# mdadm -a /dev/md124 /dev/sdb1
mdadm: re-added /dev/sdb1
確認してみる
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
md125 : active raid10 sdd2[3] sdc2[2] sda2[0]
26703872 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md126 : active raid10 sdd3[3] sdc3[2] sda3[0]
513024 blocks super 1.0 512K chunks 2 near-copies [4/3] [U_UU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md127 : active raid10 sdd5[3] sdc5[2] sda5[0]
4136960 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
unused devices: <none>
なんか大丈夫そうなので他のパーティションもくっつけてみる
[root@localhost ~]# mdadm -a /dev/md125 /dev/sdb2
mdadm: re-added /dev/sdb2
[root@localhost ~]# mdadm -a /dev/md126 /dev/sdb3
mdadm: re-added /dev/sdb3
[root@localhost ~]# mdadm -a /dev/md127 /dev/sdb5
mdadm: added /dev/sdb5
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
md125 : active raid10 sdb2[1] sdd2[3] sdc2[2] sda2[0]
26703872 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
[=================>...] recovery = 87.1% (11635712/13351936) finish=0.1min speed=207966K/sec
bitmap: 1/1 pages [4KB], 65536KB chunk
md126 : active raid10 sdb3[1] sdd3[3] sdc3[2] sda3[0]
513024 blocks super 1.0 512K chunks 2 near-copies [4/3] [U_UU]
resync=DELAYED
bitmap: 1/1 pages [4KB], 65536KB chunk
md127 : active raid10 sdb5[4] sdd5[3] sdc5[2] sda5[0]
4136960 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
resync=DELAYED
unused devices: <none>
待つこと数分・・・
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
md125 : active raid10 sdb2[1] sdd2[3] sdc2[2] sda2[0]
26703872 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 1/1 pages [4KB], 65536KB chunk
md126 : active raid10 sdb3[1] sdd3[3] sdc3[2] sda3[0]
513024 blocks super 1.0 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid10 sdb5[4] sdd5[3] sdc5[2] sda5[0]
4136960 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
unused devices: <none>
直った-!
念のためチェックしてみる
[root@localhost ~]# echo check > /sys/block/md124/md/sync_action
進捗状況の確認
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid10]
md124 : active raid10 sdb1[1] sdd1[3] sdc1[2] sda1[0]
945133568 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
[=>...................] check = 9.6% (91577856/945133568) finish=142.0min speed=100136K/sec
bitmap: 0/8 pages [0KB], 65536KB chunk
md125 : active raid10 sdb2[1] sdd2[3] sdc2[2] sda2[0]
26703872 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : active raid10 sdb3[1] sdd3[3] sdc3[2] sda3[0]
513024 blocks super 1.0 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid10 sdb5[4] sdd5[3] sdc5[2] sda5[0]
4136960 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
unused devices: <none>
えらく時間がかかりそう・・・・
でもとりあえず大丈夫そうですね!
(2016/04/12追記)
全然大丈夫じゃなくてこの後普通にHDD壊れて交換しました笑
こちらに後日談記事を掲載しました。