Help us understand the problem. What is going on with this article?

Hadoopクラスタを構築してみた(CentOS7)

More than 1 year has passed since last update.

内容

以下構成のHadoopクラスタを構築した
・マスタノード1台、スレーブノード3台
・OS:CentOS Linux release 7.5.1804
・hadoop:3.1.1

Hadoop用ファイルシステム作成(スレーブノード)

・ディスク確認

[root@localhost ~]#  lsblk -a
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   50G  0 disk
 sda1            8:1    0  500M  0 part /boot
 sda2            8:2    0 49.5G  0 part
  centos-root 253:0    0 45.6G  0 lvm  /
  centos-swap 253:1    0  3.9G  0 lvm  [SWAP]
sdb               8:16   0   50G  0 disk
sr0              11:0    1 1024M  0 rom

・パーティション作成

[root@localhost ~]# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0x32350de9.

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-104857599, default 2048): 2048
Last sector, +sectors or +size{K,M,G} (2048-104857599, default 104857599): 104857599
Partition 1 of type Linux and of size 50 GiB is set

Command (m for help): p

Disk /dev/sdb: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x32350de9

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048   104855551    52426752   83  Linux
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

・デバイスをフォーマット

[root@localhost ~]#  mkfs.xfs /dev/sdb1
meta-data=/dev/sdb1              isize=512    agcount=4, agsize=3276672 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=13106688, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=6399, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

・マウント

mkdir /Hadoop
mount /dev/sdb1 /Hadoop
[root@localhost ~]# df -T
Filesystem              Type     1K-blocks    Used Available Use% Mounted on
/dev/mapper/centos-root xfs       47781076 1443976  46337100   4% /
devtmpfs                devtmpfs   1928504       0   1928504   0% /dev
tmpfs                   tmpfs      1940480       0   1940480   0% /dev/shm
tmpfs                   tmpfs      1940480    9032   1931448   1% /run
tmpfs                   tmpfs      1940480       0   1940480   0% /sys/fs/cgroup
/dev/sda1               xfs         508588  288936    219652  57% /boot
tmpfs                   tmpfs       388096       0    388096   0% /run/user/1000
/dev/sdb1               xfs       52401156   32944  52368212   1% /Hadoop

・OS再起動時もmountされるように修正
「/etc/fstab」ファイルに
/dev/sdb1 /Hadoop xfs defaults 0 0
を追記する

※参考
https://qiita.com/aosho235/items/ad9a4764e77ba43c9d76#%E3%83%91%E3%83%BC%E3%83%86%E3%82%A3%E3%82%B7%E3%83%A7%E3%83%B3%E3%82%92%E5%88%87%E3%82%8B
https://kazmax.zpp.jp/linux_beginner/fdisk.html

Firewall,SELinux無効化

systemctl stop firewalld
systemctl disable firewalld
vi /etc/selinux/config ←「disabled」に変更

OS設定

・hostname設定
nmcli general hostname test.localdomain
・hosts設定
「/etc/hosts」ファイルに以下を追記

192.168.11.237 hadoopmaster.local
192.168.11.238 hadoopslave1.local
192.168.11.239 hadoopslave2.local
192.168.11.240 hadoopslave3.local

・Hadoopクラスタ利用ユーザー作成

useradd -m hadoop
echo hadoop | passwd hadoop --stdin
chown -R hadoop:hadoop /Hadoop ←hadoopユーザーにて書き込みできるよう権限変更

・マスタノードでの鍵作成

[hadoop@hadoopmaster ~]$ whoami
hadoop
[hadoop@hadoopmaster ~]$ ssh-keygen -t rsa -P '' -f /home/hadoop/.ssh/id_rsa
[hadoop@hadoopmaster ~]$ ls -l /home/hadoop/.ssh
total 8
-rw-------. 1 hadoop hadoop 1679 Nov 25 07:16 id_rsa
-rw-r--r--. 1 hadoop hadoop  407 Nov 25 07:16 id_rsa.pub

・スレーブノードへの公開鍵コピー
hadoopユーザーがマスタノードからスレーブノードにSSHログインできるようにするため、マスタノードの公開鍵を全スレーブノードにコピー

[hadoop@hadoopmaster ~]$ ssh-copy-id hadoop@hadoopslave1.local
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'hadoopslave1.local (192.168.11.238)' can't be established.
ECDSA key fingerprint is SHA256:/GfKYMZHk4GrIkT7q6cvY/DD4fxWHrQZVEoLay3U6UY.
ECDSA key fingerprint is MD5:50:56:37:b9:3b:a0:b7:12:bf:aa:e2:e3:14:4f:b9:e2.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@hadoopslave1.local's password: ←hadoop

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hadoop@hadoopslave1.local'"
and check to make sure that only the key(s) you wanted were added.

Javaインストール

[root@hadoopmaster ~]# yum install -y java
[root@hadoopmaster ~]# yum install -y java-1.7.0-openjdk-devel
[root@hadoopmaster ~]# java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

・環境変数設定
vi /home/hadoop/.bash_profileに以下を追記

export LANG=en_US.utf8
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/home/hadoop/hadoop-3.1.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

Hadoop設定

・インストール

[hadoop@hadoopmaster ~]$ pwd
/home/hadoop
[hadoop@hadoopmaster ~]$ wget http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
[hadoop@hadoopmaster ~]$ tar xzvf hadoop-3.1.1.tar.gz -C ./

・設定ファイル作成
①$HADOOP_HOME/etc/hadoop/core-site.xml
マスタノードの指定、データのI/Oのバッファサイズの指定など
「fs.default.name」の値に、「hdfs://hadoopmaster.local:9000」を指定

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
 <name>fs.default.name</name>
  <value>hdfs://hadoopmaster.local:9000</value>
</property>
</configuration>

②$HADOOP_HOME/etc/hadoop/yarn-site.xml
Hadoopクラスタ全体の資源管理を行うリソースマネージャの指定や、メモリ割り当て容量など
リソースマネージャにhadoopmaster.localを指定する

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
 <name>yarn.resourcemanager.hostname</name>
  <value>hadoopmaster.local</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
</configuration>

③$HADOOP_HOME/etc/hadoop/hdfs-site.xml
レプリカ数の指定、HDFSのディレクトリの指定など
「dfs.replication」の値でレプリカ数を指定
「dfs.name.dir」の値でHDFSを構成するディレクトリのパス(マスタノード)
「dfs.data.dir」の値でHDFSを構成するディレクトリのパス(データノード)

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
 <name>dfs.replication</name>
  <value>3</value>
</property>
<property>
 <name>dfs.name.dir</name>
  <value>file:///home/hadoop/hdfs/namenode</value>
</property>
</configuration>
[hadoop@hadoopslave1 ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
 <name>dfs.replication</name>
  <value>3</value>
</property>
<property>
 <name>dfs.data.dir</name>
  <value>file:///Hadoop/hdfs/datanode</value>
</property>
</configuration>

④$HADOOP_HOME/etc/hadoop/mapred-site.xml
分散処理の仕組みであり「MapReduce」の各種パラメータの設定など
CPUやメモリなどのHW資源のスケジューリングや分散処理基板向けのアプリケーション開発のためのフレームワークであるYARNを、分散処理のフレームワークとして指定する

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>
</configuration>

・slavesファイルの作成
スレーブノードのhost名を記載したファイルを作成する

[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/slaves
hadoopslave1.local
hadoopslave2.local
hadoopslave3.local

Hadoop起動

・HDFSのフォーマット(マスタノード)

[hadoop@hadoopmaster ~]$ which hdfs
~/hadoop-3.1.1/bin/hdfs
[hadoop@hadoopmaster ~]$ hdfs namenode -format

・namenodeの起動(マスタノード)

[hadoop@hadoopmaster ~]$ hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.

hdfs --daemon startコマンドが推奨されているらしい
・resoucemanagerの起動(マスタノード)

[hadoop@hadoopmaster ~]$ yarn-daemon.sh start resourcemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.

yarn --daemon startコマンドが推奨されているらしい

・HDFSメタデータの確認(マスタノード)

[hadoop@hadoopmaster ~]$ ls -l /home/hadoop/hdfs/namenode/current/
total 16
-rw-rw-r--. 1 hadoop hadoop 391 Nov 25 08:59 fsimage_0000000000000000000
-rw-rw-r--. 1 hadoop hadoop  62 Nov 25 08:59 fsimage_0000000000000000000.md5
-rw-rw-r--. 1 hadoop hadoop   2 Nov 25 08:59 seen_txid
-rw-rw-r--. 1 hadoop hadoop 217 Nov 25 08:59 VERSION

・データノードの起動(スレーブノード)

[hadoop@hadoopslave1 ~]$ hadoop-daemon.sh start datanode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[hadoop@hadoopslave1 ~]$ ls -l /Hadoop/hdfs/datanode/
total 4
drwxrwxr-x 3 hadoop hadoop 70 Nov 25 09:22 current
-rw-rw-r-- 1 hadoop hadoop 23 Nov 25 09:22 in_use.lock

hdfs --daemon startコマンドが推奨されているらしい
namenodeと起動コマンド一緒?

・nodemanagerの起動(スレーブノード)

[hadoop@hadoopslave1 ~]$ yarn-daemon.sh start nodemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.

yarn --daemon startコマンドが推奨されているらしい
resourcemanagerと同じコマンド?

・HDFS確認(マスタノード)

[hadoop@hadoopmaster ~]$ hdfs dfsadmin -report
Configured Capacity: 160978448384 (149.92 GB)
Present Capacity: 160876912640 (149.83 GB)
DFS Remaining: 160876900352 (149.83 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (3): ←現在稼働中のノード数

Name: 192.168.11.238:9866 (hadoopslave1.local)
Hostname: hadoopslave1.local
Decommission Status : Normal
Configured Capacity: 53658783744 (49.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 33845248 (32.28 MB)
DFS Remaining: 53624934400 (49.94 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.94%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 25 09:31:36 EST 2018
Last Block Report: Sun Nov 25 09:22:42 EST 2018
Num of Blocks: 0
以下略

・スレーブノードの確認(マスタノード)

[hadoop@hadoopmaster ~]$ yarn node -list
2018-11-25 09:35:21,739 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster.local/192.168.11.237:8032
Total Nodes:3
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
hadoopslave1.local:41836                RUNNING hadoopslave1.local:8042                            0
hadoopslave3.local:36843                RUNNING hadoopslave3.local:8042                            0
hadoopslave2.local:39948                RUNNING hadoopslave2.local:8042                            0

・Webコンソール
http://192.168.11.237:8088/cluster

テスト

・データ作成

[hadoop@hadoopslave1 ~]$ mkdir localdir01
[hadoop@hadoopslave1 ~]$ cat find.sh
#!/bin/sh
n=1
for i in `find /usr/share/doc -type f`;
do
 cp -a $i /home/hadoop/localdir01/`basename ${i}_${n}`; n=`expr ${n} + 1`;
done

・コピー先ディレクトリ作成

[hadoop@hadoopslave1 ~]$ hdfs dfs -ls /
[hadoop@hadoopslave1 ~]$ hdfs dfs -mkdir -p /user/hadoop/datadir01
[hadoop@hadoopslave1 ~]$ hdfs dfs -ls /user/hadoop
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2018-11-25 09:57 /user/hadoop/datadir01

・データのコピー
hdfs dfs -put /home/hadoop/localdir01/* /user/hadoop/datadir01/
・コピーしたファイルの確認
hdfs dfs -ls /user/hadoop/datadir01

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away