3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

CDH3でhadoopのクラスタを構築する

Last updated at Posted at 2014-09-08

はじめに

CDH(Cloudera's Distribution Including Apache Hadoop)3を使用してhadoopのクラスタを
構築する方法を記述します。

環境

  • CentOS 6.5
  • CDH3u6
  • jdk 1.6

構成

  • master x 1
  • slave x 2
  • client x 1
役割 ホスト名 IPアドレス
master hadoop-master 192.168.121.11
slave hadoop-slave 192.168.121.21
slave hadoop-slave2 192.168.121.22
client hadoop-master 192.168.121.101

Javaのインストール

CDH3では、OracleJDK 1.6が必要となり、1.6.0_26を推奨しているため、当該バージョンのJDKを
インストールします。

$ chmod +x jdk-6u26-linux-x64-rpm.bin
$ sudo ./jdk-6u26-linux-x64-rpm.bin

インストールされたJavaのバージョンを確認しておきます。

$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

CDH3のインストール

  • yumリポジトリの追加
$ wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
$ sudo yum localinstall cdh3-repository-1.0-1.noarch.rpm
  • リポジトリの一覧を確認します。
$ sudo yum clean all
$ yum repolist
...(省略)...
repo id                     repo name                                                   status
base                        CentOS-6 - Base                                             6,367
cloudera-cdh3               Cloudera's Distribution for Hadoop, Version 3                  67
extras                      CentOS-6 - Extras                                              15
updates                     CentOS-6 - Updates                                          1,467
repolist: 7,916
  • hadoopパッケージのインストール

クラスタの役割ごとに必要なパッケージをインストールします。

masterの場合

$ sudo yum install hadoop-0.20 hadoop-0.20-namenode hadoop-0.20-secondarynamenode hadoop-0.20-jobtracker

slaveの場合

$ sudo yum install hadoop-0.20 hadoop-0.20-datanode hadoop-0.20-tasktracker

clientの場合

$ sudo yum install hadoop-0.20

hadoopクラスタの設定

クラスタを構成するノードすべて

$ sudo cp -r  /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster
/etc/hadoop-0.20/conf.cluster/core-site.xml
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop-master:8020</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/var/lib/hadoop/cache</value>
  </property>
</configuration>
/etc/hadoop-0.20/conf.cluster/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/var/lib/hadoop/dfs/nn</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/var/lib/hadoop/dfs/dn</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
  <property>
    <name>dfs.block.size</name>
    <value>134217728</value>
  </property>
<configuration>
/etc/hadoop-0.20/conf.cluster/mapred-site.xml
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>hadoop-master:8021</value>
  </property>

  <property>
    <name>mapred.local.dir</name>
    <value>/var/lib/hadoop/dfs/mapred/local</value>
  </property>
</configuration>
  • 必要となるディレクトリを作成します。
$ sudo mkdir -p /var/lib/hadoop/cache
$ sudo chown hdfs:hadoop /var/lib/hadoop/cache
$ sudo chmod 1777 /var/lib/hadoop/cache

$ sudo mkdir -p /var/lib/hadoop/dfs/nn
$ sudo mkdir -p /var/lib/hadoop/dfs/dn
$ sudo chown -R hdfs:hadoop /var/lib/hadoop/dfs

$ sudo mkdir -p /var/lib/hadoop/dfs/mapred/local
$ sudo chown -R mapred:hadoop /var/lib/hadoop/dfs/mapred

$ sudo chmod -R 775 /var/lib/hadoop/dfs
  • conf.clusterを参照するようにalternativesの設定を追加します。
$ sudo alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - status is auto.
 link currently points to /etc/hadoop-0.20/conf.empty
/etc/hadoop-0.20/conf.empty - priority 10
Current `best' version is /etc/hadoop-0.20/conf.empty.

$ sudo alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50
$ sudo alternatives --set hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster
$ sudo alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - status is auto.
 link currently points to /etc/hadoop-0.20/conf.cluster
/etc/hadoop-0.20/conf.empty   - priority 10
/etc/hadoop-0.20/conf.cluster - priority 50
Current `best' version is /etc/hadoop-0.20/conf.cluster.
  • ホスト名でアクセスできるように、/etc/hostsへ記述を追加します。
/etc/hosts
192.168.121.11  hadoop-master
192.168.121.21  hadoop-slave
192.168.121.22  hadoop-slave2
192.168.121.101 hadoop-client

masterのみ

  • HDFSのフォーマットします。
$ sudo su - hdfs
$ hadoop namenode -format
  • サービスの起動

masterの場合

namenode及びjobtrackerを起動します。

$ sudo /etc/init.d/hadoop-0.20-namenode start
$ sudo /etc/init.d/hadoop-0.20-jobtracker start

サービスの起動を確認しておきます。

$ sudo /usr/java/default/bin/jps
XXXXX Jps
XXXXX JobTracker
XXXXX NameNode

slaveの場合

datanode及びtasktrackerを起動します。

$ sudo /etc/init.d/hadoop-0.20-datanode start
$ sudo /etc/init.d/hadoop-0.20-tasktracker start

サービスの起動を確認しておきます。

$ sudo /usr/java/default/bin/jps
XXXXX Jps
XXXXX TaskTracker
XXXXX DataNode

動作確認

hadoop-clientにて以下のサンプルプログラムを実行します。

# su - hdfs
$ hadoop hadoop-examples-0.20.2-cdh3u6.jar pi 1 300
Number of Maps  = 1
Samples per Map = 300
Wrote input for Map #0
Starting Job
14/09/08 23:49:35 INFO mapred.FileInputFormat: Total input paths to process : 1
14/09/08 23:49:36 INFO mapred.JobClient: Running job: job_201409082238_0003
14/09/08 23:49:37 INFO mapred.JobClient:  map 0% reduce 0%
14/09/08 23:49:44 INFO mapred.JobClient:  map 100% reduce 0%
14/09/08 23:49:52 INFO mapred.JobClient:  map 100% reduce 33%
14/09/08 23:49:54 INFO mapred.JobClient:  map 100% reduce 100%
14/09/08 23:49:55 INFO mapred.JobClient: Job complete: job_201409082238_0003
14/09/08 23:49:55 INFO mapred.JobClient: Counters: 27
14/09/08 23:49:55 INFO mapred.JobClient:   Job Counters
14/09/08 23:49:55 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/08 23:49:55 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=8058
14/09/08 23:49:55 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/08 23:49:55 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/08 23:49:55 INFO mapred.JobClient:     Rack-local map tasks=1
14/09/08 23:49:55 INFO mapred.JobClient:     Launched map tasks=1
14/09/08 23:49:55 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9537
14/09/08 23:49:55 INFO mapred.JobClient:   FileSystemCounters
14/09/08 23:49:55 INFO mapred.JobClient:     FILE_BYTES_READ=28
14/09/08 23:49:55 INFO mapred.JobClient:     HDFS_BYTES_READ=243
14/09/08 23:49:55 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=109744
14/09/08 23:49:55 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
14/09/08 23:49:55 INFO mapred.JobClient:   Map-Reduce Framework
14/09/08 23:49:55 INFO mapred.JobClient:     Map input records=1
14/09/08 23:49:55 INFO mapred.JobClient:     Reduce shuffle bytes=28
14/09/08 23:49:55 INFO mapred.JobClient:     Spilled Records=4
14/09/08 23:49:55 INFO mapred.JobClient:     Map output bytes=18
14/09/08 23:49:55 INFO mapred.JobClient:     CPU time spent (ms)=2040
14/09/08 23:49:55 INFO mapred.JobClient:     Total committed heap usage (bytes)=176230400
14/09/08 23:49:55 INFO mapred.JobClient:     Map input bytes=24
14/09/08 23:49:55 INFO mapred.JobClient:     Combine input records=0
14/09/08 23:49:55 INFO mapred.JobClient:     SPLIT_RAW_BYTES=125
14/09/08 23:49:55 INFO mapred.JobClient:     Reduce input records=2
14/09/08 23:49:55 INFO mapred.JobClient:     Reduce input groups=2
14/09/08 23:49:55 INFO mapred.JobClient:     Combine output records=0
14/09/08 23:49:55 INFO mapred.JobClient:     Physical memory (bytes) snapshot=281022464
14/09/08 23:49:55 INFO mapred.JobClient:     Reduce output records=0
14/09/08 23:49:55 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1429348352
14/09/08 23:49:55 INFO mapred.JobClient:     Map output records=2
Job Finished in 19.828 seconds
Estimated value of Pi is 3.16000000000000000000

参考

3
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?