はじめに
CDH3でHDFSをメンテナンスする手順を記述します。
環境
- CentOS 6.5
- CDH3
- JDK 1.6
hadoopクラスタの構成
役割 | ホスト名 | IPアドレス |
---|---|---|
master | hadoop-master | 192.168.121.11 |
slave | hadoop-slave | 192.168.121.21 |
slave | hadoop-slave2 | 192.168.121.22 |
client | hadoop-client | 192.168.121.101 |
※ CDH3でのhadoopのクラスタの構築方法は、CDH3でhadoopのクラスタを構築する をご参照ください。
HDFSのメンテナンス
- HDFSの使用率の確認
hdfs ユーザで操作します。
$ sudo su - hdfs
$ hadoop dfsadmin -report
Configured Capacity: 14184988672 (13.21 GB)
Present Capacity: 9819820032 (9.15 GB)
DFS Remaining: 9819365376 (9.14 GB)
DFS Used: 454656 (444 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)
Name: 192.168.121.21:50010
Decommission Status : Normal
Configured Capacity: 7092494336 (6.61 GB)
DFS Used: 413696 (404 KB)
Non DFS Used: 2182565888 (2.03 GB)
DFS Remaining: 4909514752(4.57 GB)
DFS Used%: 0.01%
DFS Remaining%: 69.22%
Last contact: Tue Sep 09 22:22:33 JST 2014
Name: 192.168.121.22:50010
Decommission Status : Normal
Configured Capacity: 7092494336 (6.61 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 2182602752 (2.03 GB)
DFS Remaining: 4909850624(4.57 GB)
DFS Used%: 0%
DFS Remaining%: 69.23%
Last contact: Tue Sep 09 22:22:33 JST 2014
Blocks with corrupt replicas 及び Missing blocks が0であることを確認しておきます。
- データの偏りを補正する
hdfs ユーザで操作します。
$ sudo su - hdfs
$ hadoop balancer -threshold 5
14/09/09 23:51:15 INFO balancer.Balancer: Using a threshold of 5.0
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
14/09/09 23:51:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.21:50010
14/09/09 23:51:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.22:50010
14/09/09 23:51:16 INFO balancer.Balancer: 0 over utilized nodes:
14/09/09 23:51:16 INFO balancer.Balancer: 0 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 1.09 seconds
- HDFSをチェックする
ネームノード(hadoop-master)で実行します。
$ sudo su - hdfs
$ hadoop fsck /user/hdfs/test -file -blocks -racks
FSCK started by hdfs (auth:SIMPLE) from /192.168.121.11 for path /user/hdfs/test at Tue Sep 09 23:43:28 JST 2014
/user/hdfs/test <dir>
/user/hdfs/test/LICENSE.txt 13366 bytes, 1 block(s): OK
0. blk_2224989497314884607_1496 len=13366 repl=1 [/default-rack/192.168.121.21:50010]
/user/hdfs/test/README.txt 1366 bytes, 1 block(s): OK
0. blk_738116991491030545_1490 len=1366 repl=1 [/default-rack/192.168.121.21:50010]
/user/hdfs/test/hostname.txt 22 bytes, 1 block(s): OK
0. blk_3353161374104946145_1494 len=22 repl=1 [/default-rack/192.168.121.22:50010]
/user/hdfs/test/hostname.txt.gz 39 bytes, 1 block(s): OK
0. blk_7234611454969758918_1495 len=39 repl=1 [/default-rack/192.168.121.22:50010]
Total size: Status: HEALTHY 14793 B
Total dirs: 1
Total files: 4
Total blocks (validated): 4 (avg. block size 3698 B)
Minimally replicated blocks: 4 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Tue Sep 09 23:43:28 JST 2014 in 1 milliseconds
The filesystem under path '/user/hdfs/test' is HEALTHY
Status: HEALTHY を確認しておきます。
- メタデータをバックアップする
セカンダリネームノードを別サーバで起動することで、「fsimage」及び「edits」を別サーバでも保存するようにします。
hadoopのクラスタへノード(hadoop-master2)を追加します。
追加後の構成
役割 | ホスト名 | IPアドレス |
---|---|---|
master | hadoop-master | 192.168.121.11 |
master | hadoop-master2 | 192.168.121.12 |
slave | hadoop-slave | 192.168.121.21 |
slave | hadoop-slave2 | 192.168.121.22 |
client | hadoop-client | 192.168.121.101 |
secondlynamenodeの設定を追加します。
(設定をクラスタ全体で統一しておくため、全ノードへ追加します。)
<cofiguration>
..(省略)..
<property>
<name>fs.checkpoint.period</name>
<value>60</value>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/var/lib/hadoop/dfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>/var/lib/hadoop/dfs/snn</value>
</property>
<configuration>
fs.checkpoint.period のデフォルト値は 3600 ですが、検証のため 60 に変更しています。
名前解決を行えるように /etc/hosts へ記述を追加します。
(設定をクラスタ全体で統一しておくため、全ノードへ追加します。)
192.168.121.12 hadoop-master2
secondlynamenodeを起動します。
$ sudo /etc/init.d/hadoop-0.20-secondlynamenode start
起動後、メタデータが転送されていることを確認します。
$ ls -l /var/lib/hadoop/dfs/snn
total 16
-rw-r--r--. 1 hdfs hdfs 4 Sep 10 07:03 edits
-rw-r--r--. 1 hdfs hdfs 1979 Sep 10 07:03 fsimage
-rw-r--r--. 1 hdfs hdfs 8 Sep 10 07:03 fstime
-rw-r--r--. 1 hdfs hdfs 101 Sep 10 07:03 VERSION
- メタデータをリストアする
ネームノードを稼働させるサーバをhadoop-master2へ変更することで、メタデータのリストアを行います。
セカンダリネームノードにfsimageが存在することを確認する
(hadoop-master2のみ)
$ ls -l /var/lib/hadoop/dfs/snn/current
total 16
-rw-r--r--. 1 hdfs hdfs 4 Sep 10 14:32 edits
-rw-r--r--. 1 hdfs hdfs 2079 Sep 10 14:32 fsimage
-rw-r--r--. 1 hdfs hdfs 8 Sep 10 14:32 fstime
-rw-r--r--. 1 hdfs hdfs 101 Sep 10 14:32 VERSION
セカンダリネームノードを停止します
(hadoop-master2のみ)
$ sudo /etc/init.d/hadoop-0.20-secondarynamenode stop
ネームノードを停止します
(hadoop-masterのみ)
$ sudo /etc/init.d/hadoop-0.20-namenode stop
データノードを停止します。
(hadoop-slave,hadoop-slave2)
$ sudo /etc/init.d/hadoop-0.20-datanode stop
ネームノードの稼働するホストをhadoop-masterからhadoop-master2へ変更します。
(クラスタ内のノードすべて)
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master2:8020</value>
</property>
..(省略)..
</configuration>
hadoop-master2の以下のディレクトリにメタデータが存在しないことを確認します。
(hadoop-master2のみ)
$ ls -l /var/lib/hadoop/dfs/nn
total 0
hadoop-master2でメタデータのリストアを行います。
(hadoop-master2のみ)
$ sudo su - hdfs
$ hadoop namenode -importCheckpoint
14/09/10 14:53:00 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop-master2/192.168.121.12
STARTUP_MSG: args = [-importCheckpoint]
STARTUP_MSG: version = 0.20.2-cdh3u6
STARTUP_MSG: build = file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u6 -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a; compiled by 'root' on Wed Mar 20 13:11:26 PDT 2013
************************************************************/
14/09/10 14:53:00 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
14/09/10 14:53:00 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
14/09/10 14:53:00 INFO util.GSet: VM type = 64-bit
14/09/10 14:53:00 INFO util.GSet: 2% max memory = 19.33375 MB
14/09/10 14:53:00 INFO util.GSet: capacity = 2^21 = 2097152 entries
14/09/10 14:53:00 INFO util.GSet: recommended=2097152, actual=2097152
14/09/10 14:53:00 INFO namenode.FSNamesystem: fsOwner=hdfs (auth:SIMPLE)
14/09/10 14:53:00 INFO namenode.FSNamesystem: supergroup=supergroup
14/09/10 14:53:00 INFO namenode.FSNamesystem: isPermissionEnabled=false
14/09/10 14:53:00 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000
14/09/10 14:53:00 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/09/10 14:53:01 INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
14/09/10 14:53:01 INFO common.Storage: Storage directory /var/lib/hadoop/dfs/nn is not formatted.
14/09/10 14:53:01 INFO common.Storage: Formatting ...
14/09/10 14:53:01 INFO common.Storage: Number of files = 22
14/09/10 14:53:01 INFO common.Storage: Number of files under construction = 0
14/09/10 14:53:01 INFO common.Storage: Image file of size 2079 loaded in 0 seconds.
14/09/10 14:53:01 INFO common.Storage: Edits file /var/lib/hadoop/dfs/snn/current/edits of size 4 edits # 0 loaded in 0 seconds.
14/09/10 14:53:01 INFO common.Storage: Image file of size 2079 saved in 0 seconds.
14/09/10 14:53:01 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
14/09/10 14:53:01 INFO common.Storage: Image file of size 2079 saved in 0 seconds.
14/09/10 14:53:01 INFO namenode.FSNamesystem: Finished loading FSImage in 1258 msecs
14/09/10 14:53:01 INFO hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 6 blocks to reach the threshold 0.9990 of total blocks 7. Safe mode will be turned off automatically.
14/09/10 14:53:01 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
14/09/10 14:53:01 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8020
14/09/10 14:53:01 INFO ipc.Server: Starting Socket Reader #1 for port 8020
14/09/10 14:53:01 INFO metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=NameNode, port=8020
14/09/10 14:53:01 INFO namenode.NameNode: Namenode up at: hadoop-master2/192.168.121.12:8020
14/09/10 14:53:02 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
14/09/10 14:53:02 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
14/09/10 14:53:02 INFO http.HttpServer: dfs.webhdfs.enabled = false
14/09/10 14:53:02 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
14/09/10 14:53:02 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
14/09/10 14:53:02 INFO http.HttpServer: Jetty bound to port 50070
14/09/10 14:53:02 INFO mortbay.log: jetty-6.1.26.cloudera.2
14/09/10 14:53:02 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50070
14/09/10 14:53:02 INFO namenode.NameNode: Web-server up at: 0.0.0.0:50070
14/09/10 14:53:02 INFO ipc.Server: IPC Server Responder: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server listener on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 0 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 1 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 2 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 3 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 4 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 5 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 6 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 7 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 8 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 9 on 8020: starting
14/09/10 14:54:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master2/192.168.121.12
************************************************************/
(Ctrl + Cで一旦停止させます)
メタデータがリストアされたことを確認します。
(hadoop-master2のみ)
$ ls -l /var/lib/hadoop/dfs/nn
total 12
drwxrwxr-x. 2 hdfs hdfs 4096 Sep 10 14:53 current
drwxrwxr-x. 2 hdfs hdfs 4096 Sep 10 14:53 image
drwxrwxr-x. 2 hdfs hdfs 4096 Sep 10 14:53 previous.checkpoint
$ ls -l /var/lib/hadoop/dfs/nn/current
total 16
-rw-rw-r--. 1 hdfs hdfs 4 Sep 10 14:53 edits
-rw-rw-r--. 1 hdfs hdfs 2079 Sep 10 14:53 fsimage
-rw-rw-r--. 1 hdfs hdfs 8 Sep 10 14:53 fstime
-rw-rw-r--. 1 hdfs hdfs 101 Sep 10 14:53 VERSION
ネームノードを起動します。
(hadoop-master2のみ)
$ sudo /etc/init.d/hadoop-0.20-namenode start
セカンダリネームノードを起動します。
(hadoop-master2のみ)
$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start
データノードを起動します。
(hadoop-slave,hadoop-slave2のみ)
$ sudo /etc/init.d/hadoop-0.20-datanode start
HDFSへの読み書きができることを確認します
(hadoop-client)
$ hadoop fs -ls
Found 2 items
drwxr-xr-x - hdfs supergroup 0 2014-09-10 06:59 /user/hdfs/test
drwxr-xr-x - hdfs supergroup 0 2014-09-09 10:19 /user/hdfs/test2
$ hadoop fs -ls test/
Found 5 items
-rw-r--r-- 1 hdfs supergroup 13366 2014-09-09 22:31 /user/hdfs/test/LICENSE.txt
-rw-r--r-- 1 hdfs supergroup 101 2014-09-10 06:59 /user/hdfs/test/NOTICE.txt
-rw-r--r-- 1 hdfs supergroup 1366 2014-09-09 07:46 /user/hdfs/test/README.txt
-rw-r--r-- 1 hdfs supergroup 22 2014-09-09 10:29 /user/hdfs/test/hostname.txt
-rw-r--r-- 1 hdfs supergroup 39 2014-09-09 10:30 /user/hdfs/test/hostname.txt.gz
$ dmesg | hadoop fs -put - test/dmesg.txt
$ hadoop fs -ls test/
Found 6 items
-rw-r--r-- 1 hdfs supergroup 13366 2014-09-09 22:31 /user/hdfs/test/LICENSE.txt
-rw-r--r-- 1 hdfs supergroup 101 2014-09-10 06:59 /user/hdfs/test/NOTICE.txt
-rw-r--r-- 1 hdfs supergroup 1366 2014-09-09 07:46 /user/hdfs/test/README.txt
-rw-r--r-- 1 hdfs supergroup 21881 2014-09-10 15:05 /user/hdfs/test/dmesg.txt
-rw-r--r-- 1 hdfs supergroup 22 2014-09-09 10:29 /user/hdfs/test/hostname.txt
-rw-r--r-- 1 hdfs supergroup 39 2014-09-09 10:30 /user/hdfs/test/hostname.txt.gz