More than 5 years have passed since last update.

AmazonAthenaをCLIから使う

Last updated at 2017-06-05Posted at 2017-05-22

よーやくCLIから使えるようになりました。
早速試しましょう

前提条件

S3にQuery用のダミーデータが入っているものとします
S3にAthenaの実行結果保存用ディレクトリが作成されているものとします
Athenaは us-east-1 を使用して実行しています

実行環境

コマンド

aws --version

結果

aws-cli/1.11.89 Python/3.5.0 Darwin/16.5.0 botocore/1.5.52

手元にコマンドがない場合はアップデートしましょう

アップデート

sudo pip install -U awscli --ignore-installed six

筆者の環境ではDatapipelinを使用してDynamoDBのあるテーブルデータをS3にバックアップしたものをサンプルデータとして使用しています。
DATABASEに関しては以前Javaからの接続の際検証で作成した dynamodb という名前のDATABASEが存在します。
テーブルとして、DynamoDBのJSONデータに対応するように作成した dynamodb.sample1 を使用します。

Queryの実行

コマンド

aws athena start-query-execution \
	--query-string 'select count(*) from dynamodb.sample1;' \
	--query-execution-context Database=dynamodb \
	--result-configuration OutputLocation=s3://xxxxxxxxxxxxxxxxxxxxxxxxxxx/cli-test/

{
    "QueryExecutionId": "xxxxxxxxxxxxxxxxxxxxxxxxxxx"
}

結果の取得

コマンド

aws athena get-query-results \
	--query-execution-id "xxxxxxxxxxxxxxxxxxxxxxxxxxx"

結果

{
    "ResultSet": {
        "ResultSetMetadata": {
            "ColumnInfo": [
                {
                    "CatalogName": "hive",
                    "Name": "_col0",
                    "Scale": 0,
                    "SchemaName": "",
                    "Precision": 19,
                    "Type": "bigint",
                    "CaseSensitive": false,
                    "Nullable": "UNKNOWN",
                    "Label": "_col0",
                    "TableName": ""
                }
            ]
        },
        "Rows": [
            {
                "Data": [
                    {
                        "VarCharValue": "_col0"
                    }
                ]
            },
            {
                "Data": [
                    {
                        "VarCharValue": "500000"
                    }
                ]
            }
        ]
    }
}

実行したQueryの詳細を取得する

コマンド

aws athena get-query-execution \
	--query-execution-id  "xxxxxxxxxxxxxxxxxxxxxxxxxxx"

結果

{
    "QueryExecution": {
        "Status": {
            "SubmissionDateTime": 1495440535.706,
            "State": "SUCCEEDED",
            "CompletionDateTime": 1495440549.995
        },
        "QueryExecutionId": "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
        "Statistics": {
            "DataScannedInBytes": 71512882,
            "EngineExecutionTimeInMillis": 13930
        },
        "Query": "select count(*) from dynamodb.sample1",
        "ResultConfiguration": {
            "OutputLocation": "s3://xxxxxxxxxxxxxxxxxxxxxxxxxxx/cli-test/xxxxxxxxxxxxxxxxxxxxxxxxxxx.csv"
        }
    }
}

格納先のオブジェクトパスが取得できます。

これでAthenaを簡単に使える！東京リージョンまだだけど、S3を介してやり取り出来るのでまあまあ使えるんじゃないかという所感。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up