6
6

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

CDH5でHiveを利用する(Embedded Mode)

Last updated at Posted at 2014-10-06

はじめに

CDH5でHive(Embedded Mode)を利用する方法を記述します。

環境

  • CentOS 6.5
  • CDH 5
  • Hive 0.12.0-cdh5.1.3
  • jdk 1.7.0_55

構成

ホスト名 IPアドレス ResourceManager Namenode NodeManager Datanode JobHistoryServer
hadoop-master 192.168.122.101 - -
hadoop-master2 192.168.122.102 - - -
hadoop-slave 192.168.122.111 - - -
hadoop-slave2 192.168.122.112 - - -
hadoop-slave3 192.168.122.113 - - -
hadoop-client 192.168.122.201 - - - - -

※ Hadoopのクラスタの構築方法は、CDH5でhadoopのクラスタを構築するをご参照ください。

Hiveの設定

※ hadoop-clientにHiveをインストールします。

  • Hiveのインストール
$ sudo yum install hive
  • Hive用ディレクトリをHDFS上に作成します。
$ sudo -u hdfs hadoop fs -mkdir /user/hive
$ sudo -u hdfs hadoop fs -chown hive:hadoop /user/hive
$ sudo -u hdfs hadoop fs ls /user/
Found 3 items
drwxr-xr-x   - hdfs   hadoop          0 2014-09-20 08:09 /user/hdfs
drwxrwxrwt   - mapred hadoop          0 2014-09-20 05:39 /user/history
drwxr-xr-x   - hive   hadoop          0 2014-10-06 13:34 /user/hive
  • ローカルディレクトリのパーミッションの調整
$ sudo chown hive /var/lib/hive
$ ls -ld /var/lib/hive
drwxr-xr-x 3 hive root 4096 Oct  6 13:34 /var/lib/hive

データの準備

今回は郵便番号データを使用します。

$ cd /tmp
$ curl -O http://www.post.japanpost.jp/zipcode/dl/roman/ken_all_rome.zip
$ unzip ken_all_rome.zip
$ nkf -S -w ken_all_rome/KEN_ALL_ROME.CSV > ken_all_rome/KEN_ALL_ROME.UTF8.CSV
$ head ken_all_rome/KEN_ALL_ROME.UTF8.CSV
"0600000","北海道","札幌市 中央区","以下に掲載がない場合","HOKKAIDO","SAPPORO SHI CHUO KU","IKANIKEISAIGANAIBAAI"
"0640941","北海道","札幌市 中央区","旭ケ丘","HOKKAIDO","SAPPORO SHI CHUO KU","ASAHIGAOKA"
"0600041","北海道","札幌市 中央区","大通東","HOKKAIDO","SAPPORO SHI CHUO KU","ODORIHIGASHI"
"0600042","北海道","札幌市 中央区","大通西(1~19丁目)","HOKKAIDO","SAPPORO SHI CHUO KU","ODORINISHI(1-19-CHOME)"
"0640820","北海道","札幌市 中央区","大通西(20~28丁目)","HOKKAIDO","SAPPORO SHI CHUO KU","ODORINISHI(20-28-CHOME)"
"0600031","北海道","札幌市 中央区","北一条東","HOKKAIDO","SAPPORO SHI CHUO KU","KITA1-JOHIGASHI"
"0600001","北海道","札幌市 中央区","北一条西(1~19丁目)","HOKKAIDO","SAPPORO SHI CHUO KU","KITA1-JONISHI(1-19-CHOME)"
"0640821","北海道","札幌市 中央区","北一条西(20~28丁目)","HOKKAIDO","SAPPORO SHI CHUO KU","KITA1-JONISHI(20-28-CHOME)"
"0600032","北海道","札幌市 中央区","北二条東","HOKKAIDO","SAPPORO SHI CHUO KU","KITA2-JOHIGASHI"
"0600002","北海道","札幌市 中央区","北二条西(1~19丁目)","HOKKAIDO","SAPPORO SHI CHUO KU","KITA2-JONISHI(1-19-CHOME)"

※ 郵便番号データの文字コードは、「SHIFT_JIS」ですが、そのままでは扱いにくいので「UTF8」に変換して使用しています。

データの投入

  • データベース及びテーブルの作成
$ cd /tmp
$ sudo -u hive hive
hive> create database sample;
OK

hive> show databases;
OK
default
sample
Time taken: 5.021 seconds, Fetched: 2 row(s)

hive> use sample;
OK

hive> create table zip_all (
    > zip string,
    > pref string,
    > city string,
    > town string,
    > pref_r string,
    > city_r string,
    > town_r string
    > )
    > row format delimited
    > fields terminated by ','
    > lines terminated by '\n'
    > ;

hive> show tables;
OK
zip_all
Time taken: 0.022 seconds, Fetched: 1 row(s)

hive> load data local inpath '/tmp/ken_all_rome/KEN_ALL_ROME.UTF8.CSV' into table zip_all;
Copying data from file:/tmp/ken_all_rome/KEN_ALL_ROME.UTF8.CSV
Copying file: file:/tmp/ken_all_rome/KEN_ALL_ROME.UTF8.CSV
Loading data to table sample.zip_all
Table sample.zip_all stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 12527284, raw_data_size: 0]
OK
Time taken: 0.817 seconds

データの検索

hive> select count(*) from zip_all;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
14/10/06 15:19:14 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-19-10_668_7801925482149728044-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/06 15:19:14 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-19-10_668_7801925482149728044-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Execution log at: /tmp/hive/hive_20141006151919_0ae7a324-9f85-4b3f-8036-61e18070c4bd.log
Job running in-process (local Hadoop)
2014-10-06 15:19:18,218 null map = 100%,  reduce = 0%
2014-10-06 15:19:19,226 null map = 100%,  reduce = 100%
Ended Job = job_local1780116823_0001
Execution completed successfully
MapredLocal task succeeded
OK
123699
Time taken: 9.292 seconds, Fetched: 1 row(s)

hive> select * from zip_all where pref_r = '"TOKYO TO"' and city_r = '"SHIBUYA KU"' and town_r = '"SHIBUYA"';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
14/10/06 15:22:59 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-22-56_612_8533133809152280651-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/06 15:22:59 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-22-56_612_8533133809152280651-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Execution log at: /tmp/hive/hive_20141006152222_ad9fa2f4-76da-4d3b-b083-6d84a7c48ab8.log
Job running in-process (local Hadoop)
2014-10-06 15:23:03,607 null map = 0%,  reduce = 0%
2014-10-06 15:23:04,617 null map = 100%,  reduce = 0%
Ended Job = job_local2040699265_0001
Execution completed successfully
MapredLocal task succeeded
OK
"1500002"       "東京都"        "渋谷区"        "渋谷"  "TOKYO TO"      "SHIBUYA KU"    "SHIBUYA"
Time taken: 8.706 seconds, Fetched: 1 row(s)

参考

6
6
3

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
6

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?