Help us understand the problem. What is going on with this article?

digitaloceanインスタンスへのHadoop(疑似分散モード)インストール

More than 3 years have passed since last update.

Hadoop学習のため、digitalocean上のdroplet(インスタンス)にHadoopをインストールしたので記録。

書籍「Hadoop徹底入門 第2版」を見ながら試した。※本にはdigitaloceanでやれとは書いてない。

digitalocean上でのdroplet起動

Web IFでポチポチ。2GB Ram,CentOS 6.5 x64。インスタンス名は適当にpracticeとしておく。※512MB Ramではサンプルが動かなかった

Javaのインストール

OpenJDKでなくOracle JDKがいいらしい。Java8でも動きそうだが、変なところでハマりたくないので保守的にjdk-6u45-linux-x64-rpm.binをOracleのサイトからDL。

# chmod a+x jdk-6u45-linux-x64-rpm.bin
# ./jdk-6u45-linux-x64-rpm.bin

Hadoop(CDH)のインストール

clouderaによると、

CDHは、Apache Hadoopや関連プロジェクトすべてを包含し、機能検証済み、かつ、世界でもっとも導入実績の多いディストリビューションです。
とのこと。これを使います。

# sudo rpm --import http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
# sudo rpm -ivh http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
# rpm -ql cloudera-cdh

# sudo emacs -nw /etc/yum.repos.d/cloudera-cdh4.repo
### - baseurl=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/
### + baseurl=http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4.2.1/

# sudo yum install hadoop-0.20-conf-pseudo

これでとりあえずインストールOK。

確認

# hadoop version
結果:
Hadoop 2.0.0-cdh4.2.1
Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.2.1/src/hadoop-common-project/hadoop-common -r 144bd548d481c2774fab2bec2ac2645d190f705b
Compiled by jenkins on Mon Apr 22 10:26:03 PDT 2013
From source with checksum aef88defdddfb22327a107fbd7063395

# cat /etc/passwd
以下のユーザができていることを確認。
hdfs:x:497:496:Hadoop HDFS:/var/lib/hadoop-hdfs:/bin/bash
mapred:x:496:495:Hadoop MapReduce:/usr/lib/hadoop-0.20-mapreduce:/bin/bash

セットアップ

hosts設定

# sudo vi /etc/hosts
### + 127.0.0.1   practice

Namenodeのメタデータ領域のフォーマット

# sudo -u hdfs hdfs namenode -format

# sudo service hadoop-hdfs-namenode start
# sudo service hadoop-hdfs-datanode start

Namenode, Datanodeのプロセス起動

# sudo service hadoop-hdfs-namenode start
# sudo service hadoop-hdfs-datanode start

HDFS上での作業フォルダ等の作成

# sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
# sudo -u hdfs hadoop fs -chmod 1777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
# sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
# sudo useradd tsujino
# sudo -u hdfs hadoop fs -mkdir -p /user/tsujino
# sudo -u hdfs hadoop fs -chown tsujino /user/tsujino

jobtracker, tasktrackerプロセスの起動

# sudo service hadoop-0.20-mapreduce-jobtracker start
# sudo service hadoop-0.20-mapreduce-tasktracker start

example(pi)の動作確認

# sudo -u tsujino hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 10 1000000
出力:
...
Job Finished in 41.997 seconds
Estimated value of Pi is 3.14158440000000000000

pigのインストール

# yum install pig
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした