2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

CDH3でHDFSをメンテナンスする

Last updated at Posted at 2014-09-10

はじめに

CDH3でHDFSをメンテナンスする手順を記述します。

環境

  • CentOS 6.5
  • CDH3
  • JDK 1.6

hadoopクラスタの構成

役割 ホスト名 IPアドレス
master hadoop-master 192.168.121.11
slave hadoop-slave 192.168.121.21
slave hadoop-slave2 192.168.121.22
client hadoop-client 192.168.121.101

※ CDH3でのhadoopのクラスタの構築方法は、CDH3でhadoopのクラスタを構築する をご参照ください。

HDFSのメンテナンス

  • HDFSの使用率の確認

hdfs ユーザで操作します。

$ sudo su - hdfs
$ hadoop dfsadmin -report
Configured Capacity: 14184988672 (13.21 GB)
Present Capacity: 9819820032 (9.15 GB)
DFS Remaining: 9819365376 (9.14 GB)
DFS Used: 454656 (444 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Name: 192.168.121.21:50010
Decommission Status : Normal
Configured Capacity: 7092494336 (6.61 GB)
DFS Used: 413696 (404 KB)
Non DFS Used: 2182565888 (2.03 GB)
DFS Remaining: 4909514752(4.57 GB)
DFS Used%: 0.01%
DFS Remaining%: 69.22%
Last contact: Tue Sep 09 22:22:33 JST 2014


Name: 192.168.121.22:50010
Decommission Status : Normal
Configured Capacity: 7092494336 (6.61 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 2182602752 (2.03 GB)
DFS Remaining: 4909850624(4.57 GB)
DFS Used%: 0%
DFS Remaining%: 69.23%
Last contact: Tue Sep 09 22:22:33 JST 2014

Blocks with corrupt replicas 及び Missing blocks が0であることを確認しておきます。

  • データの偏りを補正する

hdfs ユーザで操作します。

$ sudo su - hdfs
$ hadoop balancer -threshold 5
14/09/09 23:51:15 INFO balancer.Balancer: Using a threshold of 5.0
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
14/09/09 23:51:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.21:50010
14/09/09 23:51:16 INFO net.NetworkTopology: Adding a new node: /default-rack/192.168.121.22:50010
14/09/09 23:51:16 INFO balancer.Balancer: 0 over utilized nodes:
14/09/09 23:51:16 INFO balancer.Balancer: 0 under utilized nodes:
The cluster is balanced. Exiting...
Balancing took 1.09 seconds
  • HDFSをチェックする

ネームノード(hadoop-master)で実行します。

$ sudo su - hdfs
$ hadoop fsck /user/hdfs/test -file -blocks -racks
FSCK started by hdfs (auth:SIMPLE) from /192.168.121.11 for path /user/hdfs/test at Tue Sep 09 23:43:28 JST 2014
/user/hdfs/test <dir>
/user/hdfs/test/LICENSE.txt 13366 bytes, 1 block(s):  OK
0. blk_2224989497314884607_1496 len=13366 repl=1 [/default-rack/192.168.121.21:50010]

/user/hdfs/test/README.txt 1366 bytes, 1 block(s):  OK
0. blk_738116991491030545_1490 len=1366 repl=1 [/default-rack/192.168.121.21:50010]

/user/hdfs/test/hostname.txt 22 bytes, 1 block(s):  OK
0. blk_3353161374104946145_1494 len=22 repl=1 [/default-rack/192.168.121.22:50010]

/user/hdfs/test/hostname.txt.gz 39 bytes, 1 block(s):  OK
0. blk_7234611454969758918_1495 len=39 repl=1 [/default-rack/192.168.121.22:50010]


 Total size:   Status: HEALTHY 14793 B
 Total dirs:    1
 Total files:   4
 Total blocks (validated):      4 (avg. block size 3698 B)
 Minimally replicated blocks:   4 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          2
 Number of racks:               1
FSCK ended at Tue Sep 09 23:43:28 JST 2014 in 1 milliseconds


The filesystem under path '/user/hdfs/test' is HEALTHY

Status: HEALTHY を確認しておきます。

  • メタデータをバックアップする

セカンダリネームノードを別サーバで起動することで、「fsimage」及び「edits」を別サーバでも保存するようにします。

hadoopのクラスタへノード(hadoop-master2)を追加します。

追加後の構成

役割 ホスト名 IPアドレス
master hadoop-master 192.168.121.11
master hadoop-master2 192.168.121.12
slave hadoop-slave 192.168.121.21
slave hadoop-slave2 192.168.121.22
client hadoop-client 192.168.121.101

secondlynamenodeの設定を追加します。
(設定をクラスタ全体で統一しておくため、全ノードへ追加します。)

/etc/hadoop-0.20/conf.cluster/core-site.xml
<cofiguration>
..(省略)..
  <property>
    <name>fs.checkpoint.period</name>
    <value>60</value>
  </property>
  <property>
    <name>fs.checkpoint.size</name>
    <value>67108864</value>
  </property>
  <property>
    <name>fs.checkpoint.dir</name>
    <value>/var/lib/hadoop/dfs/snn</value>
  </property>
  <property>
    <name>fs.checkpoint.edits.dir</name>
    <value>/var/lib/hadoop/dfs/snn</value>
  </property>
<configuration>

fs.checkpoint.period のデフォルト値は 3600 ですが、検証のため 60 に変更しています。

名前解決を行えるように /etc/hosts へ記述を追加します。
(設定をクラスタ全体で統一しておくため、全ノードへ追加します。)

/etc/hosts
192.168.121.12 hadoop-master2

secondlynamenodeを起動します。

$ sudo /etc/init.d/hadoop-0.20-secondlynamenode start

起動後、メタデータが転送されていることを確認します。

$ ls -l /var/lib/hadoop/dfs/snn
total 16
-rw-r--r--. 1 hdfs hdfs    4 Sep 10 07:03 edits
-rw-r--r--. 1 hdfs hdfs 1979 Sep 10 07:03 fsimage
-rw-r--r--. 1 hdfs hdfs    8 Sep 10 07:03 fstime
-rw-r--r--. 1 hdfs hdfs  101 Sep 10 07:03 VERSION
  • メタデータをリストアする

ネームノードを稼働させるサーバをhadoop-master2へ変更することで、メタデータのリストアを行います。

セカンダリネームノードにfsimageが存在することを確認する
(hadoop-master2のみ)

$ ls -l /var/lib/hadoop/dfs/snn/current
total 16
-rw-r--r--. 1 hdfs hdfs    4 Sep 10 14:32 edits
-rw-r--r--. 1 hdfs hdfs 2079 Sep 10 14:32 fsimage
-rw-r--r--. 1 hdfs hdfs    8 Sep 10 14:32 fstime
-rw-r--r--. 1 hdfs hdfs  101 Sep 10 14:32 VERSION

セカンダリネームノードを停止します
(hadoop-master2のみ)

$ sudo /etc/init.d/hadoop-0.20-secondarynamenode stop

ネームノードを停止します
(hadoop-masterのみ)

$ sudo /etc/init.d/hadoop-0.20-namenode stop

データノードを停止します。
(hadoop-slave,hadoop-slave2)

$ sudo /etc/init.d/hadoop-0.20-datanode stop

ネームノードの稼働するホストをhadoop-masterからhadoop-master2へ変更します。
(クラスタ内のノードすべて)

/etc/hadoop-0.20/conf.cluster/core-site.xml
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop-master2:8020</value>
  </property>
..(省略)..
</configuration>

hadoop-master2の以下のディレクトリにメタデータが存在しないことを確認します。
(hadoop-master2のみ)

$ ls -l /var/lib/hadoop/dfs/nn
total 0

hadoop-master2でメタデータのリストアを行います。
(hadoop-master2のみ)

$ sudo su - hdfs
$ hadoop namenode -importCheckpoint
14/09/10 14:53:00 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master2/192.168.121.12
STARTUP_MSG:   args = [-importCheckpoint]
STARTUP_MSG:   version = 0.20.2-cdh3u6
STARTUP_MSG:   build = file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u6 -r efb405d2aa54039bdf39e0733cd0bb9423a1eb0a; compiled by 'root' on Wed Mar 20 13:11:26 PDT 2013
************************************************************/
14/09/10 14:53:00 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
14/09/10 14:53:00 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
14/09/10 14:53:00 INFO util.GSet: VM type       = 64-bit
14/09/10 14:53:00 INFO util.GSet: 2% max memory = 19.33375 MB
14/09/10 14:53:00 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/09/10 14:53:00 INFO util.GSet: recommended=2097152, actual=2097152
14/09/10 14:53:00 INFO namenode.FSNamesystem: fsOwner=hdfs (auth:SIMPLE)
14/09/10 14:53:00 INFO namenode.FSNamesystem: supergroup=supergroup
14/09/10 14:53:00 INFO namenode.FSNamesystem: isPermissionEnabled=false
14/09/10 14:53:00 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=1000
14/09/10 14:53:00 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
14/09/10 14:53:01 INFO metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
14/09/10 14:53:01 INFO common.Storage: Storage directory /var/lib/hadoop/dfs/nn is not formatted.
14/09/10 14:53:01 INFO common.Storage: Formatting ...
14/09/10 14:53:01 INFO common.Storage: Number of files = 22
14/09/10 14:53:01 INFO common.Storage: Number of files under construction = 0
14/09/10 14:53:01 INFO common.Storage: Image file of size 2079 loaded in 0 seconds.
14/09/10 14:53:01 INFO common.Storage: Edits file /var/lib/hadoop/dfs/snn/current/edits of size 4 edits # 0 loaded in 0 seconds.
14/09/10 14:53:01 INFO common.Storage: Image file of size 2079 saved in 0 seconds.
14/09/10 14:53:01 INFO namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
14/09/10 14:53:01 INFO common.Storage: Image file of size 2079 saved in 0 seconds.
14/09/10 14:53:01 INFO namenode.FSNamesystem: Finished loading FSImage in 1258 msecs
14/09/10 14:53:01 INFO hdfs.StateChange: STATE* Safe mode ON.
The reported blocks 0 needs additional 6 blocks to reach the threshold 0.9990 of total blocks 7. Safe mode will be turned off automatically.
14/09/10 14:53:01 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
14/09/10 14:53:01 INFO metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8020
14/09/10 14:53:01 INFO ipc.Server: Starting Socket Reader #1 for port 8020
14/09/10 14:53:01 INFO metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=NameNode, port=8020
14/09/10 14:53:01 INFO namenode.NameNode: Namenode up at: hadoop-master2/192.168.121.12:8020
14/09/10 14:53:02 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
14/09/10 14:53:02 INFO http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
14/09/10 14:53:02 INFO http.HttpServer: dfs.webhdfs.enabled = false
14/09/10 14:53:02 INFO http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50070
14/09/10 14:53:02 INFO http.HttpServer: listener.getLocalPort() returned 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
14/09/10 14:53:02 INFO http.HttpServer: Jetty bound to port 50070
14/09/10 14:53:02 INFO mortbay.log: jetty-6.1.26.cloudera.2
14/09/10 14:53:02 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50070
14/09/10 14:53:02 INFO namenode.NameNode: Web-server up at: 0.0.0.0:50070
14/09/10 14:53:02 INFO ipc.Server: IPC Server Responder: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server listener on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 0 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 1 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 2 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 3 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 4 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 5 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 6 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 7 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 8 on 8020: starting
14/09/10 14:53:02 INFO ipc.Server: IPC Server handler 9 on 8020: starting
14/09/10 14:54:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master2/192.168.121.12
************************************************************/

(Ctrl + Cで一旦停止させます)

メタデータがリストアされたことを確認します。
(hadoop-master2のみ)

$ ls -l /var/lib/hadoop/dfs/nn
total 12
drwxrwxr-x. 2 hdfs hdfs 4096 Sep 10 14:53 current
drwxrwxr-x. 2 hdfs hdfs 4096 Sep 10 14:53 image
drwxrwxr-x. 2 hdfs hdfs 4096 Sep 10 14:53 previous.checkpoint

$ ls -l /var/lib/hadoop/dfs/nn/current
total 16
-rw-rw-r--. 1 hdfs hdfs    4 Sep 10 14:53 edits
-rw-rw-r--. 1 hdfs hdfs 2079 Sep 10 14:53 fsimage
-rw-rw-r--. 1 hdfs hdfs    8 Sep 10 14:53 fstime
-rw-rw-r--. 1 hdfs hdfs  101 Sep 10 14:53 VERSION

ネームノードを起動します。
(hadoop-master2のみ)

$ sudo /etc/init.d/hadoop-0.20-namenode start

セカンダリネームノードを起動します。
(hadoop-master2のみ)

$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start

データノードを起動します。
(hadoop-slave,hadoop-slave2のみ)

$ sudo /etc/init.d/hadoop-0.20-datanode start

HDFSへの読み書きができることを確認します
(hadoop-client)

$ hadoop fs -ls
Found 2 items
drwxr-xr-x   - hdfs supergroup          0 2014-09-10 06:59 /user/hdfs/test
drwxr-xr-x   - hdfs supergroup          0 2014-09-09 10:19 /user/hdfs/test2

$ hadoop fs -ls test/
Found 5 items
-rw-r--r--   1 hdfs supergroup      13366 2014-09-09 22:31 /user/hdfs/test/LICENSE.txt
-rw-r--r--   1 hdfs supergroup        101 2014-09-10 06:59 /user/hdfs/test/NOTICE.txt
-rw-r--r--   1 hdfs supergroup       1366 2014-09-09 07:46 /user/hdfs/test/README.txt
-rw-r--r--   1 hdfs supergroup         22 2014-09-09 10:29 /user/hdfs/test/hostname.txt
-rw-r--r--   1 hdfs supergroup         39 2014-09-09 10:30 /user/hdfs/test/hostname.txt.gz

$ dmesg | hadoop fs -put - test/dmesg.txt

$ hadoop fs -ls test/
Found 6 items
-rw-r--r--   1 hdfs supergroup      13366 2014-09-09 22:31 /user/hdfs/test/LICENSE.txt
-rw-r--r--   1 hdfs supergroup        101 2014-09-10 06:59 /user/hdfs/test/NOTICE.txt
-rw-r--r--   1 hdfs supergroup       1366 2014-09-09 07:46 /user/hdfs/test/README.txt
-rw-r--r--   1 hdfs supergroup      21881 2014-09-10 15:05 /user/hdfs/test/dmesg.txt
-rw-r--r--   1 hdfs supergroup         22 2014-09-09 10:29 /user/hdfs/test/hostname.txt
-rw-r--r--   1 hdfs supergroup         39 2014-09-09 10:30 /user/hdfs/test/hostname.txt.gz

参考

2
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?