内容
以下構成のHadoopクラスタを構築した
・マスタノード1台、スレーブノード3台
・OS:CentOS Linux release 7.5.1804
・hadoop:3.1.1
Hadoop用ファイルシステム作成(スレーブノード)
・ディスク確認
[root@localhost ~]# lsblk -a
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
sda1 8:1 0 500M 0 part /boot
sda2 8:2 0 49.5G 0 part
centos-root 253:0 0 45.6G 0 lvm /
centos-swap 253:1 0 3.9G 0 lvm [SWAP]
sdb 8:16 0 50G 0 disk
sr0 11:0 1 1024M 0 rom
・パーティション作成
[root@localhost ~]# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0x32350de9.
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-104857599, default 2048): 2048
Last sector, +sectors or +size{K,M,G} (2048-104857599, default 104857599): 104857599
Partition 1 of type Linux and of size 50 GiB is set
Command (m for help): p
Disk /dev/sdb: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x32350de9
Device Boot Start End Blocks Id System
/dev/sdb1 2048 104855551 52426752 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
・デバイスをフォーマット
[root@localhost ~]# mkfs.xfs /dev/sdb1
meta-data=/dev/sdb1 isize=512 agcount=4, agsize=3276672 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=13106688, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=6399, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
・マウント
mkdir /Hadoop
mount /dev/sdb1 /Hadoop
[root@localhost ~]# df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/mapper/centos-root xfs 47781076 1443976 46337100 4% /
devtmpfs devtmpfs 1928504 0 1928504 0% /dev
tmpfs tmpfs 1940480 0 1940480 0% /dev/shm
tmpfs tmpfs 1940480 9032 1931448 1% /run
tmpfs tmpfs 1940480 0 1940480 0% /sys/fs/cgroup
/dev/sda1 xfs 508588 288936 219652 57% /boot
tmpfs tmpfs 388096 0 388096 0% /run/user/1000
/dev/sdb1 xfs 52401156 32944 52368212 1% /Hadoop
・OS再起動時もmountされるように修正
「/etc/fstab」ファイルに
/dev/sdb1 /Hadoop xfs defaults 0 0
を追記する
※参考
https://qiita.com/aosho235/items/ad9a4764e77ba43c9d76#%E3%83%91%E3%83%BC%E3%83%86%E3%82%A3%E3%82%B7%E3%83%A7%E3%83%B3%E3%82%92%E5%88%87%E3%82%8B
https://kazmax.zpp.jp/linux_beginner/fdisk.html
Firewall,SELinux無効化
systemctl stop firewalld
systemctl disable firewalld
vi /etc/selinux/config ←「disabled」に変更
OS設定
・hostname設定
nmcli general hostname test.localdomain
・hosts設定
「/etc/hosts」ファイルに以下を追記
192.168.11.237 hadoopmaster.local
192.168.11.238 hadoopslave1.local
192.168.11.239 hadoopslave2.local
192.168.11.240 hadoopslave3.local
・Hadoopクラスタ利用ユーザー作成
useradd -m hadoop
echo hadoop | passwd hadoop --stdin
chown -R hadoop:hadoop /Hadoop ←hadoopユーザーにて書き込みできるよう権限変更
・マスタノードでの鍵作成
[hadoop@hadoopmaster ~]$ whoami
hadoop
[hadoop@hadoopmaster ~]$ ssh-keygen -t rsa -P '' -f /home/hadoop/.ssh/id_rsa
[hadoop@hadoopmaster ~]$ ls -l /home/hadoop/.ssh
total 8
-rw-------. 1 hadoop hadoop 1679 Nov 25 07:16 id_rsa
-rw-r--r--. 1 hadoop hadoop 407 Nov 25 07:16 id_rsa.pub
・スレーブノードへの公開鍵コピー
hadoopユーザーがマスタノードからスレーブノードにSSHログインできるようにするため、マスタノードの公開鍵を全スレーブノードにコピー
[hadoop@hadoopmaster ~]$ ssh-copy-id hadoop@hadoopslave1.local
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
The authenticity of host 'hadoopslave1.local (192.168.11.238)' can't be established.
ECDSA key fingerprint is SHA256:/GfKYMZHk4GrIkT7q6cvY/DD4fxWHrQZVEoLay3U6UY.
ECDSA key fingerprint is MD5:50:56:37:b9:3b:a0:b7:12:bf:aa:e2:e3:14:4f:b9:e2.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@hadoopslave1.local's password: ←hadoop
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@hadoopslave1.local'"
and check to make sure that only the key(s) you wanted were added.
Javaインストール
[root@hadoopmaster ~]# yum install -y java
[root@hadoopmaster ~]# yum install -y java-1.7.0-openjdk-devel
[root@hadoopmaster ~]# java -version
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
・環境変数設定
vi /home/hadoop/.bash_profile
に以下を追記
export LANG=en_US.utf8
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_HOME=/home/hadoop/hadoop-3.1.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
Hadoop設定
・インストール
[hadoop@hadoopmaster ~]$ pwd
/home/hadoop
[hadoop@hadoopmaster ~]$ wget http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
[hadoop@hadoopmaster ~]$ tar xzvf hadoop-3.1.1.tar.gz -C ./
・設定ファイル作成
①$HADOOP_HOME/etc/hadoop/core-site.xml
マスタノードの指定、データのI/Oのバッファサイズの指定など
「fs.default.name」の値に、「hdfs://hadoopmaster.local:9000」を指定
[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopmaster.local:9000</value>
</property>
</configuration>
②$HADOOP_HOME/etc/hadoop/yarn-site.xml
Hadoopクラスタ全体の資源管理を行うリソースマネージャの指定や、メモリ割り当て容量など
リソースマネージャにhadoopmaster.localを指定する
[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoopmaster.local</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
③$HADOOP_HOME/etc/hadoop/hdfs-site.xml
レプリカ数の指定、HDFSのディレクトリの指定など
「dfs.replication」の値でレプリカ数を指定
「dfs.name.dir」の値でHDFSを構成するディレクトリのパス(マスタノード)
「dfs.data.dir」の値でHDFSを構成するディレクトリのパス(データノード)
[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
</configuration>
[hadoop@hadoopslave1 ~]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///Hadoop/hdfs/datanode</value>
</property>
</configuration>
④$HADOOP_HOME/etc/hadoop/mapred-site.xml
分散処理の仕組みであり「MapReduce」の各種パラメータの設定など
CPUやメモリなどのHW資源のスケジューリングや分散処理基板向けのアプリケーション開発のためのフレームワークであるYARNを、分散処理のフレームワークとして指定する
[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
・slavesファイルの作成
スレーブノードのhost名を記載したファイルを作成する
[hadoop@hadoopmaster ~]$ vi $HADOOP_HOME/etc/hadoop/slaves
hadoopslave1.local
hadoopslave2.local
hadoopslave3.local
Hadoop起動
・HDFSのフォーマット(マスタノード)
[hadoop@hadoopmaster ~]$ which hdfs
~/hadoop-3.1.1/bin/hdfs
[hadoop@hadoopmaster ~]$ hdfs namenode -format
・namenodeの起動(マスタノード)
[hadoop@hadoopmaster ~]$ hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
hdfs --daemon start
コマンドが推奨されているらしい
・resoucemanagerの起動(マスタノード)
[hadoop@hadoopmaster ~]$ yarn-daemon.sh start resourcemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.
yarn --daemon start
コマンドが推奨されているらしい
・HDFSメタデータの確認(マスタノード)
[hadoop@hadoopmaster ~]$ ls -l /home/hadoop/hdfs/namenode/current/
total 16
-rw-rw-r--. 1 hadoop hadoop 391 Nov 25 08:59 fsimage_0000000000000000000
-rw-rw-r--. 1 hadoop hadoop 62 Nov 25 08:59 fsimage_0000000000000000000.md5
-rw-rw-r--. 1 hadoop hadoop 2 Nov 25 08:59 seen_txid
-rw-rw-r--. 1 hadoop hadoop 217 Nov 25 08:59 VERSION
・データノードの起動(スレーブノード)
[hadoop@hadoopslave1 ~]$ hadoop-daemon.sh start datanode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[hadoop@hadoopslave1 ~]$ ls -l /Hadoop/hdfs/datanode/
total 4
drwxrwxr-x 3 hadoop hadoop 70 Nov 25 09:22 current
-rw-rw-r-- 1 hadoop hadoop 23 Nov 25 09:22 in_use.lock
hdfs --daemon start
コマンドが推奨されているらしい
namenodeと起動コマンド一緒?
・nodemanagerの起動(スレーブノード)
[hadoop@hadoopslave1 ~]$ yarn-daemon.sh start nodemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.
yarn --daemon start
コマンドが推奨されているらしい
resourcemanagerと同じコマンド?
・HDFS確認(マスタノード)
[hadoop@hadoopmaster ~]$ hdfs dfsadmin -report
Configured Capacity: 160978448384 (149.92 GB)
Present Capacity: 160876912640 (149.83 GB)
DFS Remaining: 160876900352 (149.83 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3): ←現在稼働中のノード数
Name: 192.168.11.238:9866 (hadoopslave1.local)
Hostname: hadoopslave1.local
Decommission Status : Normal
Configured Capacity: 53658783744 (49.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 33845248 (32.28 MB)
DFS Remaining: 53624934400 (49.94 GB)
DFS Used%: 0.00%
DFS Remaining%: 99.94%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 25 09:31:36 EST 2018
Last Block Report: Sun Nov 25 09:22:42 EST 2018
Num of Blocks: 0
以下略
・スレーブノードの確認(マスタノード)
[hadoop@hadoopmaster ~]$ yarn node -list
2018-11-25 09:35:21,739 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster.local/192.168.11.237:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoopslave1.local:41836 RUNNING hadoopslave1.local:8042 0
hadoopslave3.local:36843 RUNNING hadoopslave3.local:8042 0
hadoopslave2.local:39948 RUNNING hadoopslave2.local:8042 0
・Webコンソール
http://192.168.11.237:8088/cluster
テスト
・データ作成
[hadoop@hadoopslave1 ~]$ mkdir localdir01
[hadoop@hadoopslave1 ~]$ cat find.sh
# !/bin/sh
n=1
for i in `find /usr/share/doc -type f`;
do
cp -a $i /home/hadoop/localdir01/`basename ${i}_${n}`; n=`expr ${n} + 1`;
done
・コピー先ディレクトリ作成
[hadoop@hadoopslave1 ~]$ hdfs dfs -ls /
[hadoop@hadoopslave1 ~]$ hdfs dfs -mkdir -p /user/hadoop/datadir01
[hadoop@hadoopslave1 ~]$ hdfs dfs -ls /user/hadoop
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2018-11-25 09:57 /user/hadoop/datadir01
・データのコピー
hdfs dfs -put /home/hadoop/localdir01/* /user/hadoop/datadir01/
・コピーしたファイルの確認
hdfs dfs -ls /user/hadoop/datadir01