Treasure Data Toolbeltを使ってみる（Mac） #TreasureData

tdコマンドを使えるようにする

Rubyのバージョンを確認する

ターミナル

$ ruby --version
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin16]

tdコマンドのインストール (ruby gem)

ターミナル

$ gem install td

コマンドの存在確認

ターミナル

$ which td
/Users/hoge/.rbenv/shims/td

バージョン確認

ターミナル

$ td --version
0.15.8

バージョンアップ

ターミナル

$ gem update td

アカウント設定（Google SSO Users）

ターミナル

$ td apikey:set <your_apikey>

アカウント確認

ターミナル

$ less ~/.td/td.conf 
[account]
  apikey = **********************************************

コマンドのヘルプ情報

tdコマンド

ターミナル

$ td
usage: td [options] COMMAND [args]

options:
  -c, --config PATH                path to the configuration file (default: ~/.td/td.conf)
  -k, --apikey KEY                 use this API key instead of reading the config file
  -e, --endpoint API_SERVER        specify the URL for API server to use (default: https://api.treasuredata.com).
                                     The URL must contain a scheme (http:// or https:// prefix) to be valid.
                                     Valid IPv4 addresses are accepted as well in place of the host name.
      --insecure                   Insecure access: disable SSL (enabled by default)
  -v, --verbose                    verbose mode
  -h, --help                       show help
  -r, --retry-post-requests        retry on failed post requests.
                                   Warning: can cause resource duplication, such as duplicated job submissions.
      --version                    show version

Basic commands:

  db             # create/delete/list databases
  table          # create/delete/list/import/export/tail tables
  query          # issue a query
  job            # show/kill/list jobs
  import         # manage bulk import sessions (Java based fast processing)
  bulk_import    # manage bulk import sessions (Old Ruby-based implementation)
  result         # create/delete/list result URLs
  sched          # create/delete/list schedules that run a query periodically
  schema         # create/delete/modify schemas of tables
  connector      # manage connectors
  workflow       # manage workflows

Additional commands:

  status         # show scheds, jobs, tables and results
  apikey         # show/set API key
  server         # show status of the Treasure Data server
  sample         # create a sample log file
  help           # show help messages

td queryコマンド

ターミナル

$ td query --help
usage:
  $ td query [sql]

example:
  $ td query -d example_db -w -r rset1 "select count(*) from table1"
  $ td query -d example_db -w -r rset1 -q query.txt

description:
  Issue a query

options:
  -d, --database DB_NAME           use the database (required)
  -w, --wait[=SECONDS]             wait for finishing the job (for seconds)
  -G, --vertical                   use vertical table to show results
  -o, --output PATH                write result to the file
  -f, --format FORMAT              format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz)
  -r, --result RESULT_URL          write result to the URL (see also result:create subcommand)
                                    It is suggested for this option to be used with the -x / --exclude option to suppress printing
                                    of the query result to stdout or -o / --output to dump the query result into a file.
  -u, --user NAME                  set user name for the result URL
  -p, --password                   ask password for the result URL
  -P, --priority PRIORITY          set priority
  -R, --retry COUNT                automatic retrying count
  -q, --query PATH                 use file instead of inline query
  -T, --type TYPE                  set query type (hive, presto)
      --sampling DENOMINATOR       OBSOLETE - enable random sampling to reduce records 1/DENOMINATOR
  -l, --limit ROWS                 limit the number of result rows shown when not outputting to file
  -c, --column-header              output of the columns' header when the schema is available for the table (only applies to json, tsv and csv formats)
  -x, --exclude                    do not automatically retrieve the job result
  -O, --pool-name NAME             specify resource pool by name
      --domain-key DOMAIN_KEY      optional user-provided unique ID. You can include this ID with your `create` request to ensure idempotence

コマンドからクエリを実行する

hogeデータベースのhugaテーブルのselectを実行する。

$ td query -d qa_hdsp -T presto "SELECT * FROM hoge.huga LIMIT 10"

Job 23437**** is queued.
Use 'td job:show 23437****' to show the status.

クエリ実行（任意のAPI KEYを指定して実行する）

hogeデータベースのhugaテーブルのselectを実行する。

ターミナル

$ td -k ********************** query -w -t hive -d hoge -q hoge.huga.sql

hoge.huga.sql

SELECT time FROM hoge.huga LIMIT 10;

※ 「**********************」のところに任意のAPI KEYを指定する。

参考サイト

Treasure Data CLI (install)
https://docs.treasuredata.com/articles/installing-the-cli#ruby-gem
Treasure Data Toolbelt: Command-line Interface
https://docs.treasuredata.com/articles/command-line
Treasure Dataから日付範囲指定したデータを取得する
https://dev.classmethod.jp/treasuredata/data-migration-treasuredata-to-redshift-with-specify-date-range/
td command-lineからPrestoのクエリを実行する
http://www.shigemk2.com/entry/execute_presto_td_command-line