8
10

More than 5 years have passed since last update.

HiveでLTSVを利用する

Last updated at Posted at 2013-11-19

SerDeを用意してもいいんですが、Hiveの機能でmapのdelimiterを指定できるので、それを使うとLTSVな行をmap<string, string>として読むことができます。

ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\n'
    COLLECTION ITEMS TERMINATED BY '\t'
    MAP KEYS TERMINATED BY ':'

例えばこんな感じのテーブルを作って

CREATE EXTERNAL TABLE some_ltsv_table (
    record  map<string, string>
)
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\n'
    COLLECTION ITEMS TERMINATED BY '\t'
    MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE
LOCATION '/path/to/ltsv/dir'
;

/path/to/ltsv/dirにこんなltsvのファイルを入れておくと

a:1 b:2
a:3 b:4 c:5

Hiveからこんな感じに読むことができます

SELECT
    record
FROM
    some_ltsv_table

--
{"a":"1", "b":"2"}
{"a":"3", "b":"4", "c":"5"}
SELECT
    record['a']
FROM
    some_ltsv_table

--
1
3

こんな感じ。大変捗りますね。

8
10
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8
10