Hive 3.1.3 + Java8 + Beeline 構築手順
構成
| 項目 | 内容 |
|---|---|
| OS | Ubuntu24.04 |
| Hadoop | 3.3.6 |
| Hive | 3.1.3 |
| JavaDB | PostgreSQL(Metastore) |
| Java | OpenJDK 8 |
| 実行エンジン | Hive-on-MR |
| ユーザー | hadoop |
メモリは2GBあれば動きます。
前提
事前にHadoopを以下手順に従い導入していること。
導入手順
1. PostgreSQL の導入(Metastore 用)
PostgreSQL インストール
sudo apt install -y postgresql postgresql-contrib
Metastore 用ユーザー・DB 作成
sudo -u postgres psql <<'SQL'
CREATE USER hive WITH PASSWORD 'HiveStrongPassword';
CREATE DATABASE metastore OWNER hive;
\c metastore
ALTER SCHEMA public OWNER TO hive;
SQL
2. Hive のインストール
sudo apt install -y hive hive-metastore hive-server2
3. Hive 設定(hive-site.xml)
sudo tee /etc/hive/conf/hive-site.xml << 'EOF'
<configuration>
<!-- Metastore DB: PostgreSQL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>HiveStrongPassword</value>
</property>
<!-- Metastore thrift -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
</property>
<!-- HS2 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<!-- warehouse (HDFS) -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.metastore.client.notification.event.poll.interval</name>
<value>0s</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
</configuration>
EOF
4. HDFS 側の準備
sudo su - hadoop
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chown -R hive:hive /user/hive/warehouse
hdfs dfs -chmod 1777 /user/hive/warehouse
exit
5. PostgreSQL JDBC ドライバを Hive に追加
sudo apt install -y libpostgresql-jdbc-java
dpkg -L libpostgresql-jdbc-java | grep -E 'postgresql.*\.jar$'
sudo ln -sf /usr/share/java/postgresql.jar /usr/lib/hive/lib/postgresql.jar
sudo rm -rf /usr/lib/hive/lib/postgresql-9.4.1208.jre7.jar
6. Metastore スキーマ初期化(1 回だけ)
sudo su - hadoop
/usr/lib/hive/bin/schematool -dbType postgres -initSchema
exit
7. systemd サービス設定
共通ファイル
sudo tee /etc/default/hive <<'EOF'
HIVE_HOME=/usr/lib/hive
HIVE_CONF_DIR=/etc/hive/conf
HADOOP_CONF_DIR=/etc/hadoop/conf
YARN_CONF_DIR=/etc/hadoop/conf
MAPRED_CONF_DIR=/etc/hadoop/conf
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
TERM=dumb
EOF
HiveMatastore
sudo tee /etc/systemd/system/hive-metastore.service <<'EOF'
[Unit]
Description=Apache Hive Metastore Service
After=network.target
Wants=network.target
After=hadoop-hdfs-namenode.service hadoop-hdfs-datanode.service hadoop-yarn-resourcemanager.service hadoop-yarn-nodemanager.service
Wants=hadoop-hdfs-namenode.service hadoop-hdfs-datanode.service hadoop-yarn-resourcemanager.service hadoop-yarn-nodemanager.service
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hive
Environment="HIVE_CONF_DIR=/etc/hive/conf"
ExecStart=/usr/lib/hive/bin/hive --service metastore
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
hiveserver2
sudo tee /etc/systemd/system/hiveserver2.service <<'EOF'
[Unit]
Description=Apache HiveServer2 Service
After=network.target hive-metastore.service
Wants=network.target hive-metastore.service
[Service]
Type=simple
User=hadoop
Group=hadoop
EnvironmentFile=-/etc/default/hive
Environment="HADOOP_CLIENT_OPTS=-Dlog4j2.debug=false"
Environment="HIVE_OPTS=--hiveconf hive.root.logger=INFO,console --hiveconf hive.server2.logging.operation.enabled=true"
Environment="HADOOP_HEAPSIZE=1024"
ExecStartPre=/bin/bash -lc 'for i in {1..30}; do nc -z localhost 9083 && exit 0; sleep 1; done; exit 1'
ExecStart=/usr/lib/hive/bin/hive --service hiveserver2
Restart=on-failure
RestartSec=5
LimitNOFILE=100000
[Install]
WantedBy=multi-user.target
EOF
サービス立ち上げ(hive)
sudo systemctl daemon-reload
sudo systemctl enable --now hive-metastore
sudo systemctl enable --now hiveserver2
8. 稼働ポート確認
ss -lntp | egrep ':9083|:10000'
以下のポートが出てこればOK
9083 → Hive Metastore
10000 → HiveServer2
動作確認
1. Beeline 接続(メモリ制限付き)
sudo su - hadoop
/usr/lib/hive/bin/beeline \
--hiveconf mapreduce.map.memory.mb=512 \
--hiveconf mapreduce.reduce.memory.mb=512 \
--hiveconf yarn.app.mapreduce.am.resource.mb=512 \
--hiveconf mapreduce.map.java.opts="-Xmx384m" \
--hiveconf mapreduce.reduce.java.opts="-Xmx384m" \
--hiveconf yarn.app.mapreduce.am.command-opts="-Xmx384m" \
-u 'jdbc:hive2://localhost:10000/default' \
-n hadoop
2. 実行エンジン確認
set hive.execution.engine;
結果:
hive.execution.engine=mr
3. テーブル作成・INSERT・SELECT
CREATE TABLE t1 (
col1 INT,
col2 STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
INSERT INTO t1 VALUES (1,'a'),(2,'b');
SELECT * FROM t1;
結果:
1 a
2 b
やり直す際は以下のコマンドを入れてまた実施する。
drop table t1;
4. YARN UI での確認(成功の裏取り)
http://ホストIPアドレス:8088
Application Type: MAPREDUCE
State: FINISHED
Logs に Exception がないこと