LoginSignup
4

More than 5 years have passed since last update.

embulk使い方メモ

Posted at
  • 自分用のメモの色合いが濃いです
  • 下記の手順を記述してあります
    • インストール
    • プラグインのインストール
    • 設定ファイルの自動生成
    • jsonlのデータをsqliteにいれる

インストール

$ curl -o ~/bin/embulk -L https://bintray.com/artifact/download/embulk/maven/embulk-0.8.13.jar
$ chmod +x ~/bin/embulk

作業ディレクトリ作成, embulk bundle

$ mkdir embulktest && cd $_
$ embulk mkbundle bundle
$ echo "gem 'embulk-parser-jsonl'" | tee -a ./bundle/Gemfile
$ echo "gem 'embulk-output-sqlite3'" | tee -a ./bundle/Gemfile
$ (cd bundle/ && embulk bundle)

seed.yml.liquid をこんな感じで作成

in:
  type: file
  path_prefix: "{{ env.PWD }}/{{ env.YYYYmmdd }}"
out:
  type: stdout

データはjsonlでテスト用のものをとても適当を作成
ymlに環境変数を埋め込んで使ってみたかったのでファイル名はdate +"%Y%m%d.jsonl

$ pbpaste | tee ./`date +"%Y%m%d"`.jsonl
{"id":1, "title":"Sample Konfabulator Widget","nameName":"main_window","width":500,"height":500}
{"id":2, "title":"NEXT Widget","nameName":"sub_window","width":100,"height":200}
{"id":3, "title":"tiger","nameName":"so_l","width":250,"height":400}
{"id":4, "title":"Sample Konfabulator Widget","nameName":"main_window","width":500,"height":500}
{"id":5, "title":"NEXT Widget","nameName":"sub_window","width":100,"height":200}
{"id":5, "title":"tiger","nameName":"so_l","width":250,"height":400}
{"id":7, "title":"Sample Konfabulator Widget","nameName":"main_window","width":500,"height":500}
$ PWD=`pwd` YYYYmmdd=`date +'%Y%m%d'` embulk guess -b ./bundle/ -g jsonl seed.yml.liquid
$ PWD=`pwd` YYYYmmdd=`date +'%Y%m%d'` embulk guess -b ./bundle/ -g jsonl seed.yml.liquid -o jsonl-test.yml
$ PWD=`pwd` YYYYmmdd=`date +'%Y%m%d'` embulk preview ./jsonl-test.yml -b ./bundle/

...

+---------+----------------------------+-----------------+------------+-------------+
| id:long |               title:string | nameName:string | width:long | height:long |
+---------+----------------------------+-----------------+------------+-------------+
|       1 | Sample Konfabulator Widget |     main_window |        500 |         500 |
|       2 |                NEXT Widget |      sub_window |        100 |         200 |
|       3 |                      tiger |            so_l |        250 |         400 |
|       4 | Sample Konfabulator Widget |     main_window |        500 |         500 |
|       5 |                NEXT Widget |      sub_window |        100 |         200 |
|       5 |                      tiger |            so_l |        250 |         400 |
|       7 | Sample Konfabulator Widget |     main_window |        500 |         500 |
+---------+----------------------------+-----------------+------------+-------------+

ちゃんとguess出来てそうなのでjsonl-test.ymlにsqliteのoutput設定を記述

# 追記した部分
out:
  type: sqlite3
  database: '/tmp/embulktest.db'  # 特にDBファイル,テーブルを作っておく必要はない
  table: 'embulktest1'

実行

$ PWD=`pwd` YYYYmmdd=`date +'%Y%m%d'` embulk run ./jsonl-test.yml -b ./bundle/
2016-08-22 23:49:21.754 +0900: Embulk v0.8.13
2016-08-22 23:49:23.289 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-sqlite3 (0.0.1)
2016-08-22 23:49:23.334 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-jsonl (0.2.0)
2016-08-22 23:49:23.364 +0900 [INFO] (0001:transaction): Listing local files at directory '/Users/sakamotoakira/work/embulktest' filtering filename by prefix '20160822'
2016-08-22 23:49:23.377 +0900 [INFO] (0001:transaction): Loading files [/Users/sakamotoakira/work/embulktest/20160822.jsonl]
2016-08-22 23:49:23.454 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 4 = input tasks 1 * 4
2016-08-22 23:49:23.716 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2016-08-22 23:49:23.919 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2016-08-22 23:49:23.955 +0900 [INFO] (main): Committed.
2016-08-22 23:49:23.956 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"/Users/sakamotoakira/work/embulktest/20160822.jsonl"},"out":{}}

DBファイルも無いところからテーブルまで生成できる (ちょっとびっくりした)

$ sqlite3 /tmp/embulktest.db "select * from embulktest1;"
1|Sample Konfabulator Widget|main_window|500|500
2|NEXT Widget|sub_window|100|200
3|tiger|so_l|250|400
4|Sample Konfabulator Widget|main_window|500|500
5|NEXT Widget|sub_window|100|200
5|tiger|so_l|250|400
7|Sample Konfabulator Widget|main_window|500|500

まとめ

  • embulkはjarひとつから簡単に使い始めることが出来る
  • guessやpreviewなど、作業時にうれしい機能が揃っている
  • sqliteプラグインは便利で、けっこう横着させてくれる
  • 調査や集計などで作り捨てのsqliteファイルを作るのに良さそうだと思った

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4