3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

TreasureData接続情報を~/.td/td.confに書くとどこに有効か

Last updated at Posted at 2016-10-27

やりたいこと

TreasureDataには色々な方法で接続する(tdコマンド、embulkコマンド、Digdagのtd>、Digdagのembulk>)
同じTreasureData接続情報(エンドポイント・APIキー)を複数箇所に重複して書くのはできるだけ避けたい

結果

~/.td/td.confに書いてみたところ、以下の通り有効な箇所と無効な箇所があった

  1. 有効
    1. tdコマンド
    2. Digdagのtd>(ローカルモード)
  2. 無効
    1. Digdagのtd>(サーバモード)
    2. Embulkのembulk-input-tdプラグイン
    3. Embulkのembulk-output-tdプラグイン

Digdagサーバモードに効かないのは納得が行く

Embulkのtdプラグインに効かないのは残念
改善して頂けると嬉しい

やったこと

作成

エンドポイントにはhttps://をつけること(つけないとエラー)

$ id
uid=500(vagrant) gid=500(vagrant) groups=500(vagrant) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
$ pwd
/home/vagrant
$ td --version
0.15.0
$ ls .td
ls: cannot access .td: No such file or directory
$ td -e https://<エンドポイント> account
Enter your Treasure Data credentials. For Google SSO user, please see https://docs.treasuredata.com/articles/command-line#google-sso-users
Email: <ユーザ名>
Password (typing will be hidden): <パスワード>
Authenticated successfully.
Use 'td -e https://<エンドポイント> db:create <db_name>' to create a database.
$ ls .td
td.conf
$ cat .td/td.conf
[account]
  user = <ユーザ名>
  apikey = <APIキー>
  endpoint = https://<エンドポイント>

実はvagrant実行時にはこのtd.confを直接作っている
(td accountを実行しても作れるがパスワードをVagrantfileに書くのがAPIキーを書くより抵抗が強かったから)

  config.vm.provision "shell", privileged: false, inline: <<-EOT
    mkdir                                         ~/.td
    echo '[account]'                           >  ~/.td/td.conf
    echo 'user     = <ユーザ名>'              >> ~/.td/td.conf
    echo 'apikey   = <APIキー>'               >> ~/.td/td.conf
    echo 'endpoint = https://<エンドポイント>' >> ~/.td/td.conf
  EOT

tdコマンド

以下の通り有効

$ cat xxx.sql
SELECT COUNT(*) AS count FROM xxx
$ td query -d xxx -w -q xxx.sql
Job 9999999 is queued.
Use 'td job:show 9999999' to show the status.
queued...
~中略~
Status      : success
Result      :
+-------+
| count |
+-------+
| 42    |
+-------+
1 row in set

Digdagのtd>(ローカルモード)

以下の通り有効

$ cat xxx.dig
+task1:
   td>: xxx.sql
   database: xxx
   store_last_results: true
+task2:
   echo>: ${td.last_results.count}
$ digdag run xxx.dig
2016-10-26 10:56:52 +0900: Digdag v0.8.17
2016-10-26 10:56:55 +0900 [WARN] (main): Using a new session time 2016-10-26T00:00:00+00:00.
2016-10-26 10:56:55 +0900 [INFO] (main): Using session /tmp/test/.digdag/status/20161026T000000+0000.
2016-10-26 10:56:55 +0900 [INFO] (main): Starting a new session project id=1 workflow name=xxx session_time=2016-10-26T00:00:00+00:00
2016-10-26 10:56:58 +0900 [INFO] (0016@+xxx+task1): td>: xxx.sql
2016-10-26 10:56:59 +0900 [INFO] (0016@+xxx+task1): td-client version: 0.7.26
2016-10-26 10:56:59 +0900 [INFO] (0016@+xxx+task1): Logging initialized @6699ms
2016-10-26 10:57:00 +0900 [INFO] (0016@+xxx+task1): td>: xxx.sql
2016-10-26 10:57:01 +0900 [INFO] (0016@+xxx+task1): Started presto job id=9999999:
SELECT COUNT(*) AS count FROM xxx

2016-10-26 10:57:04 +0900 [INFO] (0016@+xxx+task1): td>: xxx.sql
2016-10-26 10:57:06 +0900 [INFO] (0016@+xxx+task2): echo>: 42
42
Success. Task state is saved at /tmp/test/.digdag/status/20161026T000000+0000 directory.
  * Use --session <daily | hourly | "yyyy-MM-dd[ HH:mm:ss]"> to not reuse the last session time.
  * Use --rerun, --start +NAME, or --goal +NAME argument to rerun skipped tasks.

Digdagのtd>(サーバモード)

以下の通り無効

$ cat ~/.config/digdag/config
client.http.endpoint = http://<DigdagサーバマシンのIPアドレス>:<ポート番号>/
$ digdag push proj1
2016-10-27 12:27:30 +0900: Digdag v0.8.17
Creating .digdag/tmp/archive-7184579153809927090.tar.gz...
  Archiving xxx.dig
  Archiving xxx.sql
Workflows:
  xxx

Uploaded:
  id: 10
  name: proj1
  revision: a5b35e9e-8ae5-4942-af2a-7b0a4ed12c3d
  archive type: db
  project created at: 2016-10-27T03:27:33Z
  revision updated at: 2016-10-27T03:27:33Z

Use `digdag workflows` to show all workflows.
$ digdag start proj1 xxx --session now
2016-10-27 12:28:16 +0900: Digdag v0.8.17
Started a session attempt:
  session id: 112
  attempt id: 111
  uuid: 4ed676b1-01bf-4dee-ba5d-d9b5e032588d
  project: proj1
  workflow: xxx
  session time: 2016-10-27 03:28:19 +0000
  retry attempt name:
  params: {}
  created at: 2016-10-27 12:28:19 +0900

* Use `digdag session 112` to show session status.
* Use `digdag task 111` and `digdag log 111` to show task status and logs.
$ digdag log 111
2016-10-27 12:29:32 +0900: Digdag v0.8.17
2016-10-27 12:28:22.488 +0900 [INFO] (0074@+xxx+task1) io.digdag.core.agent.OperatorManager: td>: xxx.sql
2016-10-27 12:28:23.146 +0900 [INFO] (0074@+xxx+task1) com.treasuredata.client.TDClient: td-client version: 0.7.26
2016-10-27 12:28:23.161 +0900 [ERROR] (0074@+xxx+task1) io.digdag.core.agent.OperatorManager: Configuration error at task +xxx+task1: The 'td.apikey' secret is missing (config)
2016-10-27 12:28:23.971 +0900 [INFO] (0074@+xxx^failure-alert) io.digdag.core.agent.OperatorManager: type: notify
$ ssh <DigdagサーバマシンのIPアドレス> ps -ef|grep digdag
vagrant@<DigdagサーバマシンのIPアドレス>'s password:
vagrant   1524     1  2 12:18 ?        00:00:16 java -XX:+AggressiveOpts -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none -jar /usr/local/bin/digdag server -c /home/vagrant/.config/digdag/config -O /home/vagrant/digdag-server/task-log
$ ssh <DigdagサーバマシンのIPアドレス> cat ~/.td/td.conf
vagrant@<DigdagサーバマシンのIPアドレス>'s password:
[account]
  user = <ユーザ名>
  apikey = <APIキー>
  endpoint = https://<エンドポイント>
$ digdag version
2016-10-27 12:34:41 +0900: Digdag v0.8.17
Client version: 0.8.17
Server version: 0.8.17

クライアント側に~/.td/td.confがあってもそれはサーバでの実行時には見ない

Digdagサーバマシンでサーバプロセスユーザの~/.td/td.confがあってもそれはサーバでの実行時には見ない
サーバには複数ユーザから複数プロジェクトのpushがあり、それらはエンドポイント・APIキーを共通使用すべきでないから、ということか

Embulkのembulk-input-tdプラグイン

まず動く例
エンドポイントにはhttps://をつけないこと(つけるとエラー)

$ cat input.yml
in:
  type: td
  apikey: <APIキー>
  endpoint: <エンドポイント>
  database: xxx
  query: SELECT * FROM xxx
out:
  type: file
  path_prefix: xxx
  file_ext: csv
  formatter:
    type: csv
    header_line: true
$ embulk run input.yml
2016-10-27 12:43:01.916 +0900: Embulk v0.8.14
2016-10-27 12:43:06.817 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-td (0.1.0)
2016-10-27 12:43:06.925 +0900 [INFO] (0001:transaction): td-client version: 0.7.24
2016-10-27 12:43:06.938 +0900 [INFO] (0001:transaction): Reading configuration file: /home/vagrant/.td/td.conf
2016-10-27 12:43:07.006 +0900 [INFO] (0001:transaction): Logging initialized @13514ms
2016-10-27 12:43:07.648 +0900 [INFO] (0001:transaction): Submit a query for database 'xxx': SELECT * FROM xxx
2016-10-27 12:43:08.650 +0900 [INFO] (0001:transaction): Job 8065368 is queued.
2016-10-27 12:43:08.650 +0900 [INFO] (0001:transaction): Confirm that job 8065368 finished
2016-10-27 12:43:14.317 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2016-10-27 12:43:14.460 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2016-10-27 12:43:14.668 +0900 [INFO] (0023:task-0000): Writing local file 'xxx000.00.csv'
2016-10-27 12:43:15.141 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2016-10-27 12:43:15.168 +0900 [INFO] (main): Committed.
2016-10-27 12:43:15.170 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}

Reading configuration file: /home/vagrant/.td/td.confとあるのでAPIキー・エンドポイントをtd.confから取得してくれそう
しかしendpointをコメントアウトして実行するとエラー
https://github.com/muga/embulk-input-td#configuration にあるようにデフォルトの api.treasuredata.com に行ってしまったのだろう

$ embulk run input.yml
2016-10-27 12:46:48.242 +0900: Embulk v0.8.14
2016-10-27 12:46:52.945 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-td (0.1.0)
2016-10-27 12:46:53.061 +0900 [INFO] (0001:transaction): td-client version: 0.7.24
2016-10-27 12:46:53.068 +0900 [INFO] (0001:transaction): Reading configuration file: /home/vagrant/.td/td.conf
2016-10-27 12:46:53.158 +0900 [INFO] (0001:transaction): Logging initialized @13498ms
2016-10-27 12:46:53.743 +0900 [INFO] (0001:transaction): Submit a query for database 'xxx': SELECT * FROM xxx
2016-10-27 12:46:55.023 +0900 [WARN] (0001:transaction): API request failed
java.util.concurrent.ExecutionException: org.eclipse.jetty.client.HttpResponseException: HTTP protocol violation: Authentication challenge without WWW-Authenticate header
        at org.eclipse.jetty.client.util.FutureResponseListener.getResult(FutureResponseListener.java:118) ~[jetty-client-9.2.2.v20140723.jar:9.2.2.v20140723]
        at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:101) ~[jetty-client-9.2.2.v20140723.jar:9.2.2.v20140723]
~略~

endpointを戻してapikeyをコメントアウトしてもエラー
https://github.com/muga/embulk-input-td#configuration にapikeyは必須とある

$ embulk run input.yml
2016-10-27 12:49:09.632 +0900: Embulk v0.8.14
2016-10-27 12:49:14.507 +0900 [INFO] (0001:transaction): Loaded plugin embulk-input-td (0.1.0)
org.embulk.exec.PartialExecutionException: org.embulk.config.ConfigException: com.fasterxml.jackson.databind.JsonMappingException: Field 'apikey' is required but not set
 at [Source: N/A; line: -1, column: -1]
        at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(org/embulk/exec/BulkLoader.java:363)

Embulkのembulk-output-tdプラグイン

embulk-input-tdプラグインと結果は同じ

$ cat output_for_guess.yml
in:
 type: file
 path_prefix: xxx
out:
  type: td
  apikey: <APIキー>
  endpoint: <エンドポイント>
  database: xxx
  table: xxx2
  mode: truncate
$ embulk guess output_for_guess.yml -o output.yml
2016-10-27 13:01:49.899 +0900: Embulk v0.8.14
2016-10-27 13:01:51.835 +0900 [INFO] (0001:guess): Listing local files at directory '.' filtering filename by prefix 'xxx'
2016-10-27 13:01:51.839 +0900 [INFO] (0001:guess): Loading files [xxx000.00.csv]
2016-10-27 13:01:52.038 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/gzip from a load path
2016-10-27 13:01:52.062 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/bzip2 from a load path
2016-10-27 13:01:52.102 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/json from a load path
2016-10-27 13:01:52.119 +0900 [INFO] (0001:guess): Loaded plugin embulk/guess/csv from a load path
in:
  type: file
  path_prefix: xxx
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: col1, type: string}
    - {name: col2, type: string}
    - {name: time, type: long}
out: {type: td, apikey: <APIキー>, endpoint: <エンドポイント>,
  database: xxx, table: xxx2, mode: truncate}
Created 'output.yml' file.
$ embulk run output.yml
2016-10-27 13:02:22.364 +0900: Embulk v0.8.14
2016-10-27 13:02:27.395 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-td (0.3.8)
2016-10-27 13:02:27.511 +0900 [INFO] (0001:transaction): Listing local files at directory '.' filtering filename by prefix 'xxx'
2016-10-27 13:02:27.522 +0900 [INFO] (0001:transaction): Loading files [xxx000.00.csv]
2016-10-27 13:02:27.690 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2016-10-27 13:02:27.803 +0900 [INFO] (0001:transaction): td-client version: 0.7.24
2016-10-27 13:02:27.815 +0900 [INFO] (0001:transaction): Reading configuration file: /home/vagrant/.td/td.conf
2016-10-27 13:02:27.887 +0900 [INFO] (0001:transaction): Logging initialized @13895ms
2016-10-27 13:02:29.793 +0900 [INFO] (0001:transaction): Using time:long column as the data partitioning key
2016-10-27 13:02:29.796 +0900 [INFO] (0001:transaction): Create bulk_import session embulk_20161027_040227_014000000
2016-10-27 13:02:30.176 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2016-10-27 13:02:30.540 +0900 [INFO] (0022:task-0000): {uploading: {rows: 20, size: 1,166 bytes (compressed)}}
2016-10-27 13:02:30.974 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2016-10-27 13:02:31.761 +0900 [INFO] (0001:transaction): Performing bulk import session 'embulk_20161027_040227_014000000'
2016-10-27 13:03:12.793 +0900 [INFO] (0001:transaction):     job id: 8065734
2016-10-27 13:03:13.262 +0900 [INFO] (0001:transaction): Committing bulk import session 'embulk_20161027_040227_014000000'
2016-10-27 13:03:13.263 +0900 [INFO] (0001:transaction):     valid records: 20
2016-10-27 13:03:13.263 +0900 [INFO] (0001:transaction):     error records: 0
2016-10-27 13:03:13.263 +0900 [INFO] (0001:transaction):     valid parts: 1
2016-10-27 13:03:13.263 +0900 [INFO] (0001:transaction):     error parts: 0
2016-10-27 13:03:13.263 +0900 [INFO] (0001:transaction):     new columns:
2016-10-27 13:03:13.265 +0900 [INFO] (0001:transaction):       - col1: string
2016-10-27 13:03:13.266 +0900 [INFO] (0001:transaction):       - col2: string
2016-10-27 13:03:20.469 +0900 [INFO] (0001:transaction): Deleting bulk import session 'embulk_20161027_040227_014000000'
2016-10-27 13:03:20.876 +0900 [INFO] (main): Committed.
2016-10-27 13:03:20.877 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"xxx000.00.csv"},"out":{"last_session":"embulk_20161027_040227_014000000"}}
$ vi output.yml # endpoint定義削除
$ embulk run output.yml
2016-10-27 13:07:21.973 +0900: Embulk v0.8.14
2016-10-27 13:07:26.872 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-td (0.3.8)
2016-10-27 13:07:26.993 +0900 [INFO] (0001:transaction): Listing local files at directory '.' filtering filename by prefix 'xxx'
2016-10-27 13:07:27.004 +0900 [INFO] (0001:transaction): Loading files [xxx000.00.csv]
2016-10-27 13:07:27.159 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2016-10-27 13:07:27.271 +0900 [INFO] (0001:transaction): td-client version: 0.7.24
2016-10-27 13:07:27.277 +0900 [INFO] (0001:transaction): Reading configuration file: /home/vagrant/.td/td.conf
2016-10-27 13:07:27.345 +0900 [INFO] (0001:transaction): Logging initialized @13797ms
2016-10-27 13:07:29.389 +0900 [WARN] (0001:transaction): API request failed
java.util.concurrent.ExecutionException: org.eclipse.jetty.client.HttpResponseException: HTTP protocol violation: Authentication challenge without WWW-Authenticate header
        at org.eclipse.jetty.client.util.FutureResponseListener.getResult(FutureResponseListener.java:118) ~[jetty-client-9.2.2.v20140723.jar:9.2.2.v20140723]
~略~
$ vi output.yml # apikey定義削除
$ embulk run output.yml
2016-10-27 13:08:38.456 +0900: Embulk v0.8.14
2016-10-27 13:08:43.483 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-td (0.3.8)
2016-10-27 13:08:43.607 +0900 [INFO] (0001:transaction): Listing local files at directory '.' filtering filename by prefix 'xxx'
2016-10-27 13:08:43.617 +0900 [INFO] (0001:transaction): Loading files [xxx000.00.csv]
2016-10-27 13:08:43.788 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
org.embulk.exec.PartialExecutionException: org.embulk.config.ConfigException: com.fasterxml.jackson.databind.JsonMappingException: Field 'apikey' is required but not set
 at [Source: N/A; line: -1, column: -1]
        at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(org/embulk/exec/BulkLoader.java:363)
3
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?