LoginSignup
0
0

More than 5 years have passed since last update.

Hiveで郵便番号を検索してみます。

Last updated at Posted at 2018-09-23

日本郵便株式会社が公開している郵便番号のCSVファルをHiveでロードしてHiveで検索した際のメモです。

実験

テストデータダウンロード

  • 郵政省株式会社が公開している郵便番号CSVデータをもとに実験します。

-ダウンロード

wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/01hokkai.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/02aomori.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/03iwate.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/04miyagi.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/05akita.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/06yamaga.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/07fukush.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/08ibarak.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/09tochig.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/10gumma.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/11saitam.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/12chiba.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/13tokyo.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/14kanaga.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/15niigat.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/16toyama.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/17ishika.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/18fukui.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/19yamana.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/20nagano.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/21gifu.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/22shizuo.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/23aichi.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/24mie.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/25shiga.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/26kyouto.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/27osaka.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/28hyogo.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/29nara.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/30wakaya.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/31tottor.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/32shiman.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/33okayam.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/34hirosh.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/35yamagu.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/36tokush.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/37kagawa.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/38ehime.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/39kochi.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/40fukuok.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/41saga.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/42nagasa.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/43kumamo.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/44oita.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/45miyaza.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/46kagosh.zip
wget https://www.post.japanpost.jp/zipcode/dl/kogaki/zip/47okinaw.zip

ファイルダ一括ウンロードスクリプト作成

  • このようなファイルを準備する。<br.
01hokkai.zip
02aomori.zip
03iwate.zip
04miyagi.zip
05akita.zip
06yamaga.zip
07fukush.zip
08ibarak.zip
09tochig.zip
10gumma.zip
11saitam.zip
12chiba.zip
13tokyo.zip
14kanaga.zip
15niigat.zip
16toyama.zip
17ishika.zip
18fukui.zip
19yamana.zip
20nagano.zip
21gifu.zip
22shizuo.zip
23aichi.zip
24mie.zip
25shiga.zip
26kyouto.zip
27osaka.zip
28hyogo.zip
29nara.zip
30wakaya.zip
31tottor.zip
32shiman.zip
33okayam.zip
34hirosh.zip
35yamagu.zip
36tokush.zip
37kagawa.zip
38ehime.zip
39kochi.zip
40fukuok.zip
41saga.zip
42nagasa.zip
43kumamo.zip
44oita.zip
45miyaza.zip
46kagosh.zip
47okinaw.zip
  • 準備した後、一括ダウンロードスクリプトを作成する。
    viで編集
:1,$s/.*/wget https:\/\/www.post.japanpost.jp\/zipcode\/dl\/kogaki\/zip\/&/
  • ダウンロード後、一括展開スクリプトを作成する。
    viで編集
:1,$s/.*/unzip &/
  • 展開すると拡張子が「*.CSV」のファイルに変換されるので、一括文字コード変換スクリプトを作成する。
    CSVの文字コードがSJISなのでUTF-8に変換しておきます。 viで編集
:1,$s/.*/iconv -f sjis -t utf-8 & -o &/
  • 文字コード変換後、一括データロードスクリプトを作成する。
    viで編集
:1,$s/.*/LOAD DATA LOCAL INPATH '\/home\/hadoop\/tmp\/zipcode\/&' INTO TABLE ZIPCODE;/

Hadoop起動

$ start-dfs.sh; start-yarn.sh

Hive起動

$ hive

実験用データベース作成

hive> CREATE DATABASE testdb;

実験用データベース切替

hive> USE testdb;

実験用テーブル作成

CREATE TABLE ZIPCODE(
  LOCAL_PUB_BODY_CODE INT
, OLD_ZIPCODE STRING
, ZiPCODE STRING
, PREF_NAME_KANA STRING
, CITY_NAME_KANA STRING
, TOWN_NAME_KANA STRING
, PREF_NAME_KANJI STRING
, CITY_NAME_KANJI STRING
, TOWN_NAME_KANJI STRING
, ZIPCODE_TYPE01 TINYINT
, ZIPCODE_TYPE02 TINYINT
, ZIPCODE_TYPE03 TINYINT
, ZIPCODE_TYPE04 TINYINT
, UPDATE_CODE TINYINT
, UPDATE_REASON_CODE TINYINT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\n'
;

実験用データロード

  • 一括ロードデータスクリプトを作成した後の1行目に「use testdb;」、2行目のロードオプションに「OVERWRITE」を指定する。
USE testdb;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/01HOKKAI.CSV' OVERWRITE INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/02AOMORI.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/03IWATE.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/04MIYAGI.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/05AKITA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/06YAMAGA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/07FUKUSH.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/08IBARAK.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/09TOCHIG.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/10GUMMA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/11SAITAM.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/12CHIBA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/13TOKYO.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/14KANAGA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/15NIIGAT.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/16TOYAMA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/17ISHIKA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/18FUKUI.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/19YAMANA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/20NAGANO.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/21GIFU.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/22SHIZUO.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/23AICHI.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/24MIE.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/25SHIGA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/26KYOUTO.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/27OSAKA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/28HYOGO.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/29NARA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/30WAKAYA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/31TOTTOR.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/32SHIMAN.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/33OKAYAM.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/34HIROSH.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/35YAMAGU.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/36TOKUSH.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/37KAGAWA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/38EHIME.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/39KOCHI.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/40FUKUOK.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/41SAGA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/42NAGASA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/43KUMAMO.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/44OITA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/45MIYAZA.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/46KAGOSH.CSV'  INTO TABLE ZIPCODE;
LOAD DATA LOCAL INPATH '/home/hadoop/tmp/zipcode/47OKINAW.CSV'  INTO TABLE ZIPCODE;
  • データロード
    上記のファイルの名称に「dataload.sql」を指定した場合。
$ hive -f dataload.sql

検索

  • 旧郵便番号でグルーピング
hive> USE testdb;
hive> SELECT COUNT(*) AS COUNT, OLD_ZIPCODE FROM ZIPCODE GROUP BY OLD_ZIPCODE ORDER BY OLD_ZIPCODE;
  • 出力例
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180923195724_72d0774a-d522-4976-8335-9c4977dd481c
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2018-09-23 19:57:30,952 Stage-1 map = 0%,  reduce = 0%
2018-09-23 19:57:32,995 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local2442718_0001
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2018-09-23 19:57:34,983 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_local1694582313_0002
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 3081598 HDFS Write: 0 SUCCESS
Stage-Stage-2:  HDFS Read: 3081598 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
55      "001  "
20      "002  "
84      "010  "
26      "01001"

~中略~

21      "99022"
25      "99023"
1       43  "
Time taken: 10.953 seconds, Fetched: 433 row(s)
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0