Hadoop (HDFS + YARN + MapReduce) Single Node 構築手順
Ubuntu 24.04 + BigTop + Java 11によるシングルノードHadoop構築
前提条件
OS: Ubuntu 24.04
Hadoop: BigTop 提供パッケージ(Hadoop 3.3.6)
Java: OpenJDK 8
構成: Single Node(NameNode / DataNode / YARN / MR 全部同一ホスト)
HDFS NameNode: localhost:9000
Hadoop 実行ユーザー: hadoop
メモリは2GBあれば動きます。
1. BigTop APT リポジトリ追加
sudo tee /etc/apt/sources.list.d/bigtop.list << 'EOF'
deb [trusted=yes] http://repos.bigtop.apache.org/releases/3.3.0/ubuntu/22.04/amd64 bigtop contrib
EOF
sudo apt update
2. Java 8 & Hadoop パッケージインストール
sudo apt install -y \
openjdk-8-jdk \
hadoop \
hadoop-hdfs \
hadoop-yarn \
hadoop-mapreduce
3. hadoop ユーザー作成
sudo useradd -m -g hadoop -s /bin/bash hadoop
sudo passwd hadoop
4. Hadoop ディレクトリ作成
sudo mkdir -p \
/var/lib/hadoop/tmp \
/var/lib/hadoop/yarn/local \
/var/lib/hadoop/yarn/logs
sudo chown -R hadoop:hadoop /var/lib/hadoop
5. core-site.xml 設定
sudo tee /etc/hadoop/conf/core-site.xml << 'EOF'
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop/tmp</value>
</property>
<property>
<name>yarn.timeline-service.fs-writer.root-dir</name>
<value>/atsv2</value>
</property>
</configuration>
EOF
6. yarn-site 設定
sudo tee /etc/hadoop/conf/yarn-site.xml << 'EOF'
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- ★ 無いと MapReduce が失敗する -->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/lib/hadoop/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/lib/hadoop/yarn/logs</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>
/etc/hadoop/conf,
/etc/hadoop/conf/*,
/usr/lib/hadoop/*,
/usr/lib/hadoop/lib/*,
/usr/lib/hadoop-hdfs/*,
/usr/lib/hadoop-hdfs/lib/*,
/usr/lib/hadoop-mapreduce/*,
/usr/lib/hadoop-mapreduce/lib/*,
/usr/lib/hadoop-yarn/*,
/usr/lib/hadoop-yarn/lib/*
</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>0.0.0.0:8088</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.version</name>
<value>2.0</value>
</property>
<!-- 外部ホストからもUIを見たいなら 0.0.0.0 推奨 -->
<property>
<name>yarn.timeline-service.hostname</name>
<value>localhost</value>
</property>
<!-- RPC(既定 10200) -->
<property>
<name>yarn.timeline-service.address</name>
<value>0.0.0.0:10200</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/app-logs</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://0.0.0.0:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.webapp.ui2.enable</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.writer.class</name>
<value>org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl</value>
</property>
<property>
<name>yarn.timeline-service.reader.webapp.address</name>
<value>0.0.0.0:8188</value>
</property>
</configuration>
EOF
sudo tee -a /etc/hadoop/conf/yarn-env.sh >/dev/null <<'EOF'
# ATS v2 timelineservice jars
export HADOOP_CLASSPATH="${HADOOP_CLASSPATH}:/usr/lib/hadoop-yarn/timelineservice/*"
export HADOOP_USER_CLASSPATH_FIRST=true
EOF
7. mapred-site.xml 設定
sudo tee /etc/hadoop/conf/mapred-site.xml << 'EOF'
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>
HADOOP_COMMON_HOME=/usr/lib/hadoop,
HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs,
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>
HADOOP_COMMON_HOME=/usr/lib/hadoop,
HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs,
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>
HADOOP_COMMON_HOME=/usr/lib/hadoop,
HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs,
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/etc/hadoop/conf,
/etc/hadoop/conf/*,
/usr/lib/hadoop/*,
/usr/lib/hadoop/lib/*,
/usr/lib/hadoop-hdfs/*,
/usr/lib/hadoop-hdfs/lib/*,
/usr/lib/hadoop-mapreduce/*,
/usr/lib/hadoop-mapreduce/lib/*,
/usr/lib/hadoop-yarn/*,
/usr/lib/hadoop-yarn/lib/*
</value>
</property>
<!-- JobHistoryServer RPC -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<!-- JobHistoryServer Web UI -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
<!-- Job history 送信先(HDFS) -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
</configuration>
EOF
8. hadoop ユーザーの .bashrc 設定
sudo tee -a /home/hadoop/.bashrc << 'EOF'
export HADOOP_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:/usr/lib/hadoop/bin
EOF
sudo chown hadoop:hadoop /home/hadoop/.bashrc
9. hadoop ユーザーへ切り替え
sudo su - hadoop
10. HDFS フォーマット
hdfs namenode -format -nonInteractive
exit
11. サービス作成
共通ファイル
sudo tee /etc/default/hadoop <<'EOF'
HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf
YARN_CONF_DIR=/etc/hadoop/conf
MAPRED_CONF_DIR=/etc/hadoop/conf
# もし必要なら Java を明示
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
# tput 警告回避(任意)
TERM=dumb
EOF
NameNode
sudo tee /etc/systemd/system/hadoop-hdfs-namenode.service <<'EOF'
[Unit]
Description=Hadoop HDFS NameNode
After=network.target
Wants=network.target
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hadoop
# 例: hadoop のデータディレクトリが /var/lib/hadoop-hdfs などなら合わせる
WorkingDirectory=/var/lib/hadoop-hdfs
ExecStart=/usr/bin/hdfs --config ${HADOOP_CONF_DIR} namenode
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
DataNode
sudo tee /etc/systemd/system/hadoop-hdfs-datanode.service <<'EOF'
[Unit]
Description=Hadoop HDFS DataNode
After=network.target
Wants=network.target
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hadoop
WorkingDirectory=/var/lib/hadoop-hdfs
ExecStart=/usr/bin/hdfs --config ${HADOOP_CONF_DIR} datanode
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
ResourceManager
sudo tee /etc/systemd/system/hadoop-yarn-resourcemanager.service <<'EOF'
[Unit]
Description=Hadoop YARN ResourceManager
After=network.target
Wants=network.target
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hadoop
WorkingDirectory=/var/lib/hadoop-yarn
ExecStart=/usr/bin/yarn --config ${HADOOP_CONF_DIR} resourcemanager
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
NodeManager
sudo tee /etc/systemd/system/hadoop-yarn-nodemanager.service <<'EOF'
[Unit]
Description=Hadoop YARN NodeManager
After=network.target
Wants=network.target
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hadoop
WorkingDirectory=/var/lib/hadoop-yarn
ExecStart=/usr/bin/yarn --config ${HADOOP_CONF_DIR} nodemanager
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
timelineserver
sudo tee /etc/systemd/system/hadoop-yarn-timelineserver.service << 'EOF'
[Unit]
Description=Hadoop YARN Timeline Server (ATS v2)
After=network.target
Wants=network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
EnvironmentFile=/etc/default/hadoop
ExecStart=/usr/lib/hadoop-yarn/bin/yarn --daemon start timelinereader
ExecStop=/usr/lib/hadoop-yarn/bin/yarn --daemon stop timelinereader
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
historyserver
sudo tee /etc/systemd/system/hadoop-mapreduce-historyserver.service <<'EOF'
[Unit]
Description=Hadoop MapReduce JobHistoryServer
After=network.target
Wants=network.target
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hadoop
# conf を明示して起動
ExecStart=/usr/bin/mapred --config ${HADOOP_CONF_DIR} historyserver
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
サービス立ち上げ(HDFS)
sudo systemctl daemon-reload
sudo systemctl enable --now hadoop-hdfs-namenode
sudo systemctl enable --now hadoop-hdfs-datanode
確認:
sudo su - hadoop
jps
以下が見えればOK:
NameNode
DataNode
JobHistoryディレクトリ,YARNログ収集ディレクトリ作成、TimelineServerディレクトリ作成
# JobHistoryディレクトリ作成
hdfs dfs -mkdir -p /mr-history/done /mr-history/tmp
hdfs dfs -chown -R hadoop:hadoop /mr-history
hdfs dfs -chmod -R 1777 /mr-history
# YARNログ収集ディレクトリ作成
hdfs dfs -mkdir -p /app-logs
hdfs dfs -chown -R hadoop:hadoop /app-logs
hdfs dfs -chmod 1777 /app-logs
# TimelineServerディレクトリ作成
hdfs dfs -mkdir -p /atsv2
hdfs dfs -chown yarn:yarn /atsv2
hdfs dfs -chmod 1777 /atsv2
exit
サービス立ち上げ(YARN)
sudo systemctl enable --now hadoop-yarn-resourcemanager
sudo systemctl enable --now hadoop-yarn-nodemanager
sudo systemctl enable --now hadoop-yarn-timelineserver
sudo systemctl enable --now hadoop-mapreduce-historyserver
確認:
sudo su - hadoop
jps
以下が見えればOK:
NameNode
DataNode
ResourceManager
NodeManager
TimelineReaderServer
JobHistoryServer
12. 動作確認(WordCount)
hdfs dfs -mkdir /input
echo "hello hadoop hadoop yarn" | hdfs dfs -put - /input/test.txt
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-3.3.6.jar \
wordcount \
-D mapreduce.map.memory.mb=256 \
-D mapreduce.reduce.memory.mb=256 \
-D yarn.app.mapreduce.am.resource.mb=512 \
-D mapreduce.map.java.opts="-Xmx200m" \
-D mapreduce.reduce.java.opts="-Xmx200m" \
-D yarn.app.mapreduce.am.command-opts="-Xmx400m" \
/input /output
結果確認:
hdfs dfs -cat /output/part-r-00000
→それぞれの単語と数が出てくること。
やり直す際は以下のコマンドを入れてまた実施する。
hdfs dfs -rm -r -skipTrash /output
13. 動作確認(管理画面)
以下の管理画面が参照できること。
| 管理画面 | URL |
|---|---|
| NameNode | http://ホストIPアドレス:9870 |
| DataNode | http://ホストIPアドレス:9864 |
| ResourceManager | http://ホストIPアドレス:8088 |
| NodeManager | http://ホストIPアドレス:8042 |
| JobHistory | http://ホストIPアドレス:19888 |
| TimelineService(API応答) | http://ホストIPアドレス:8188/ws/v2/timeline |
| YARN UI2 | http://ホストIPアドレス:8088/ui2 |