参考資料
-
公式
- ちゃんと書いてある、もちろん正しい、ちょっと次どれ読めばいいのか分かんなくなることあるけど
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3.html
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_6_3.html?scroll=topic_6_3_6_unique_4
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_21.html#topic_21
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_21_3.html#topic_21_3_1_unique_1
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3.html
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_20.html#topic_20
- http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_21.html#topic_21
-
師匠のblog
- 「Hadoop(1)」と「Hadoop(2)」の自分に取って必要と思われる記事を読む
-
モリス様のこの記事
- 公式ドキュメント読む前に読んでおくと、どのノードになに入れればいいのかイメージわきやすい
作業手順
zookeeper
# yum install zookeeper
# yum install zookeeper-server
# service zookeeper-server init
ZooKeeper data directory already exists at /var/lib/zookeeper/xxx (or use --force to force re-initialization)
# service zookeeper-server init --force
Force enabled, data/txnlog directories will be re-initialized
No myid provided, be sure to specify it in /var/lib/zookeeper/xxx/myid if using non-standalone
# service zookeeper-server start
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper ... STARTED
# service zookeeper-server status
zookeeper-server is running
進捗
とりあえずこんな状態で、
[root@cdh45-pseudo] # for srv in hadoop-* hive-* mysqld zookeeper-server
do
service $srv status
done
Hadoop datanode is running [ OK ]
Hadoop namenode is running [ OK ]
Hadoop secondarynamenode is running [ OK ]
Hadoop historyserver is running [ OK ]
Hadoop nodemanager is running [ OK ]
Hadoop resourcemanager is running [ OK ]
Hive Metastore is running [ OK ]
Hive Server2 is running [ OK ]
mysqld (pid 11180) is running...
zookeeper-server is running
こんなHiveのjobが動いた。
-bash-4.1$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/hive/hive_job_log_4bd19a6b-df46-4ff6-b837-f31288eee6f7_1996749314.txt
WARNING: Encountered an error while trying to initialize Hive's history file. History will not be available during this session.
/var/lib/hive/.hivehistory (Permission denied)
hive> show tables;
OK
logs
Time taken: 2.271 seconds
hive> select * from logs where id=1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1393287325156_0002, Tracking URL = http://cdh45-pseudo:8088/proxy/application_1393287325156_0002/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1393287325156_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-02-25 11:03:54,437 Stage-1 map = 0%, reduce = 0%
2014-02-25 11:04:05,395 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.18 sec
MapReduce Total cumulative CPU time: 1 seconds 180 msec
Ended Job = job_1393287325156_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.18 sec HDFS Read: 262 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 180 msec
OK
Time taken: 21.058 seconds
関連しそうなパッケージはこんな感じ。
[root@cdh45-pseudo] # rpm -qa | grep -Ei '(^(hadoop|hive|zookeeper|mysql)|\-yarn\-)' | sort -u
hadoop-0.20-mapreduce-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-client-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-conf-pseudo-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-hdfs-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-hdfs-datanode-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-hdfs-namenode-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-hdfs-secondarynamenode-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-mapreduce-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-mapreduce-historyserver-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-yarn-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-yarn-nodemanager-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hadoop-yarn-resourcemanager-2.0.0+1518-1.cdh4.5.0.p0.24.el5.x86_64
hive-0.10.0+214-1.cdh4.5.0.p0.25.el5.noarch
hive-hbase-0.10.0+214-1.cdh4.5.0.p0.25.el5.noarch
hive-jdbc-0.10.0+214-1.cdh4.5.0.p0.25.el5.noarch
hive-metastore-0.10.0+214-1.cdh4.5.0.p0.25.el5.noarch
hive-server2-0.10.0+214-1.cdh4.5.0.p0.25.el5.noarch
mysql-5.1.73-3.el6_5.x86_64
mysql-connector-java-5.1.17-6.el6.noarch
mysql-libs-5.1.73-3.el6_5.x86_64
mysql-server-5.1.73-3.el6_5.x86_64
zookeeper-3.4.5+24-1.cdh4.5.0.p0.23.el5.noarch
zookeeper-server-3.4.5+24-1.cdh4.5.0.p0.23.el5.noarch
# sudo -u hdfs hdfs dfs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
# sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
# sudo -u hdfs hadoop fs -mkdir -p /hbase && sudo -u hdfs hadoop fs -chown -R hbase:hadoop /hbase
# sudo -u hdfs hadoop fs -mkdir -p /user/hive/warehouse && sudo -u hdfs hadoop fs -chown -R hive:hadoop /user/hive
# sudo -u hdfs hdfs dfs -ls -R /
drwxr-xr-x - hbase hadoop 0 2014-03-05 11:20 /hbase
drwxrwxrwt - hdfs supergroup 0 2014-03-05 11:22 /tmp
drwxr-xr-x - hdfs supergroup 0 2014-03-05 11:21 /user
drwxr-xr-x - hive hadoop 0 2014-03-05 11:21 /user/hive
drwxr-xr-x - hive hadoop 0 2014-03-05 11:21 /user/hive/warehouse
drwxr-xr-x - hdfs supergroup 0 2014-03-05 11:21 /var
drwxr-xr-x - hdfs supergroup 0 2014-03-05 11:21 /var/log
drwxr-xr-x - yarn mapred 0 2014-03-05 11:21 /var/log/hadoop-yarn
# sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp && sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn