6
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

apache drill触ってみた

Last updated at Posted at 2016-01-05

ランチの時にapache drillの話をふわっと聞いた。
なんでもRDS、DynamoDBのテーブルとファイルシステム上、S3上のJSON/tsv/csvファイルをSQLライクなクエリでJoinしたりできるらしい。

面白そうなので触ってみる

document

インストール

brewで一撃

brew install apache-drill

mockデータ作成

mockarooでmockデータ作成

JSON(カラムはデフォルトのまま

/tmp/work/MOCK_DATA.json
[{"id":1,"gender":"Male","first_name":"Willie","last_name":"Gardner","email":"wgardner0@csmonitor.com","ip_address":"171.75.157.165"},
{"id":2,"gender":"Female","first_name":"Lois","last_name":"Jacobs","email":"ljacobs1@photobucket.com","ip_address":"188.171.106.166"},
{"id":3,"gender":"Male","first_name":"Shawn","last_name":"Rivera","email":"srivera2@storify.com","ip_address":"160.224.138.213"},
{"id":4,"gender":"Female","first_name":"Stephanie","last_name":"Hernandez","email":"shernandez3@shinystat.com","ip_address":"77.67.203.114"},
{"id":5,"gender":"Male","first_name":"Ralph","last_name":"Wright","email":"rwright4@booking.com","ip_address":"89.240.134.189"},
....

CSV(colorとcompany_nameを追加

/tmp/work/MOCK_DATA.csv
id,first_name,last_name,email,gender,ip_address,color,company_name
1,Kenneth,Wheeler,kwheeler0@domainmarket.com,Male,54.230.82.29,Khaki,Voonyx
2,Brenda,Young,byoung1@networkadvertising.org,Female,79.121.226.209,Goldenrod,Feedbug
3,Steven,Knight,sknight2@hud.gov,Male,161.79.31.161,Purple,Aimbo
4,Timothy,Daniels,tdaniels3@ovh.net,Male,233.15.252.170,Mauv,Oba
5,Adam,Wright,awright4@cbc.ca,Male,56.247.113.21,Indigo,Kwilith
....

動作確認

まずはdrillシェルに入る。just drill it!!

$ sqlline -u jdbc:drill:zk=local
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Jan 05, 2016 3:32:08 PM org.glassfish.jersey.server.ApplicationHandler initialize                                                                  INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...                                                                  apache drill 1.4.0
"just drill it"
 0: jdbc:drill:zk=local>

WEBコンソールも用意されており、シェルに入った状態で
http://127.0.0.1:8047/
にアクセスすることができる

Screen Shot 2016-01-05 at 16.06.45.png

jsonの方を引いてみる

0: jdbc:drill:zk=local> SELECT tbl.* FROM dfs.`/tmp/work/MOCK_DATA.json` AS tbl LIMIT 10;
+-----+---------+-------------+------------+------------------------------------+------------------+
| id  | gender  | first_name  | last_name  |               email                |    ip_address    |
+-----+---------+-------------+------------+------------------------------------+------------------+
| 1   | Male    | Willie      | Gardner    | wgardner0@csmonitor.com            | 171.75.157.165   |
| 2   | Female  | Lois        | Jacobs     | ljacobs1@photobucket.com           | 188.171.106.166  |
| 3   | Male    | Shawn       | Rivera     | srivera2@storify.com               | 160.224.138.213  |
| 4   | Female  | Stephanie   | Hernandez  | shernandez3@shinystat.com          | 77.67.203.114    |
| 5   | Male    | Ralph       | Wright     | rwright4@booking.com               | 89.240.134.189   |
| 6   | Male    | Donald      | Graham     | dgraham5@google.com.br             | 239.6.250.80     |
| 7   | Male    | Justin      | Howell     | jhowell6@dion.ne.jp                | 159.104.8.8      |
| 8   | Male    | Russell     | Mills      | rmills7@nsw.gov.au                 | 245.130.6.110    |
| 9   | Female  | Rose        | Gonzalez   | rgonzalez8@scientificamerican.com  | 16.9.119.25      |
| 10  | Female  | Mildred     | Ramos      | mramos9@unc.edu                    | 82.232.80.156    |
+-----+---------+-------------+------------+------------------------------------+------------------+
10 rows selected (0.215 seconds)

うほほほほ。こりゃいい。しかも結構はやい。

csvの方も引いてみる

0: jdbc:drill:zk=local> SELECT tbl.* FROM dfs.`/tmp/work/MOCK_DATA.csv` AS tbl  LIMIT 10;
+----------------------------------------------------------------------------------------------------------+
|                                                 columns                                                  |
+----------------------------------------------------------------------------------------------------------+
| ["id","first_name","last_name","email","gender","ip_address","color","company_name"]                     |
| ["1","Kenneth","Wheeler","kwheeler0@domainmarket.com","Male","54.230.82.29","Khaki","Voonyx"]            |
| ["2","Brenda","Young","byoung1@networkadvertising.org","Female","79.121.226.209","Goldenrod","Feedbug"]  |
| ["3","Steven","Knight","sknight2@hud.gov","Male","161.79.31.161","Purple","Aimbo"]                       |
| ["4","Timothy","Daniels","tdaniels3@ovh.net","Male","233.15.252.170","Mauv","Oba"]                       |
| ["5","Adam","Wright","awright4@cbc.ca","Male","56.247.113.21","Indigo","Kwilith"]                        |
| ["6","Jane","Wheeler","jwheeler5@unc.edu","Female","42.132.133.182","Teal","Gabcube"]                    |
| ["7","Jack","Smith","jsmith6@domainmarket.com","Male","199.219.253.212","Fuscia","Pixope"]               |
| ["8","James","Hayes","jhayes7@businessinsider.com","Male","237.34.224.202","Purple","Yabox"]             |
| ["9","Susan","Murphy","smurphy8@joomla.org","Female","85.200.49.196","Turquoise","Jaloo"]                |
+----------------------------------------------------------------------------------------------------------+

ほほー。csvの方はこうなっているのですね。これでJOINなんてできるの・・?
なんかオプションがあるのだろうか。

と思ってドキュメント軽く見たら書いてた。
https://drill.apache.org/docs/text-files-csv-tsv-psv/

この辺りを定義すればいける模様。

  "skipFirstLine": false,
  "extractHeader": true,

cfgファイルではなくWebコンソールかcurlで設定するらしい。
今回はwebコンソールで既存のcsvの定義を変更することにした。

メニュータブのStorageに遷移し、dfsの[Update]をクリック

Screen_Shot_2016-01-05_at_16_16_36.png

csv,tsvにskipFirstLineとextractHeaderを追記し、[Update]をクリック
Screen_Shot_2016-01-05_at_16_27_39.png

0: jdbc:drill:zk=local> SELECT tbl.* FROM dfs.`/tmp/work/MOCK_DATA.csv` AS tbl  LIMIT 10;
+-----+-------------+------------+---------------------------------+---------+------------------+------------+---------------+
| id  | first_name  | last_name  |              email              | gender  |    ip_address    |   color    | company_name  |
+-----+-------------+------------+---------------------------------+---------+------------------+------------+---------------+
| 1   | Kenneth     | Wheeler    | kwheeler0@domainmarket.com      | Male    | 54.230.82.29     | Khaki      | Voonyx        |
| 2   | Brenda      | Young      | byoung1@networkadvertising.org  | Female  | 79.121.226.209   | Goldenrod  | Feedbug       |
| 3   | Steven      | Knight     | sknight2@hud.gov                | Male    | 161.79.31.161    | Purple     | Aimbo         |
| 4   | Timothy     | Daniels    | tdaniels3@ovh.net               | Male    | 233.15.252.170   | Mauv       | Oba           |
| 5   | Adam        | Wright     | awright4@cbc.ca                 | Male    | 56.247.113.21    | Indigo     | Kwilith       |
| 6   | Jane        | Wheeler    | jwheeler5@unc.edu               | Female  | 42.132.133.182   | Teal       | Gabcube       |
| 7   | Jack        | Smith      | jsmith6@domainmarket.com        | Male    | 199.219.253.212  | Fuscia     | Pixope        |
| 8   | James       | Hayes      | jhayes7@businessinsider.com     | Male    | 237.34.224.202   | Purple     | Yabox         |
| 9   | Susan       | Murphy     | smurphy8@joomla.org             | Female  | 85.200.49.196    | Turquoise  | Jaloo         |
| 10  | Ann         | Hill       | ahill9@youku.com                | Female  | 254.251.33.227   | Red        | Yadel         |
+-----+-------------+------------+---------------------------------+---------+------------------+------------+---------------+

キターーー!

大正義Join!!!

おめぇ、これできたらと思うとおらワクワクすっべ・・・

0: jdbc:drill:zk=local> SELECT tbl1.first_name, tbl1.last_name, tbl2.color, tbl2.company_name FROM dfs.`/tmp/work/MOCK_DATA.json` AS tbl1
. . . . . . . . . . . > JOIN
. . . . . . . . . . . > dfs.`/tmp/work/MOCK_DATA.csv` AS tbl2 ON tbl1.first_name=tbl2.first_name and tbl1.last_name=tbl2.last_name;
+-------------+-------------+-------------+----------------+
| first_name  |  last_name  |    color    |  company_name  |
+-------------+-------------+-------------+----------------+
| Scott       | Ray         | Maroon      | Twinte         |
| Andrew      | Brown       | Orange      | Photolist      |
| Rose        | Washington  | Orange      | Livetube       |
| Steven      | Adams       | Maroon      | Zoombeat       |
| Nancy       | Porter      | Indigo      | Kwinu          |
| Lois        | Williams    | Purple      | Flashset       |
| Justin      | Hart        | Red         | Miboo          |
| Arthur      | Scott       | Indigo      | Thoughtbeat    |
| Wanda       | Peters      | Khaki       | Teklist        |
| Christine   | Dean        | Aquamarine  | Buzzshare      |
| Paula       | Nelson      | Green       | Kare           |
| Betty       | Wagner      | Aquamarine  | Photobean      |
| Jerry       | Ross        | Mauv        | Linktype       |
| Harry       | Washington  | Green       | Reallinks      |
| Nancy       | Lynch       | Turquoise   | Twitternation  |
| Heather     | Moore       | Violet      | Buzzbean       |
| Linda       | Gray        | Violet      | Fadeo          |
+-------------+-------------+-------------+----------------+
17 rows selected (1.026 seconds)

あまり意味のないクエリですがきちんとJOINできました。
素晴らしいですね。

仕事で使えそうなケースがあれば大規模データでの検証やSparkとの比較も実施してみようと思います。

6
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?