LoginSignup
5
2

More than 5 years have passed since last update.

PythonでS3 Selectを試しました。2018-04-20

Posted at

概要

S3 Selectをシンプルなコードで試しました
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html

2018-04-20
Node.jsで動かそうと思ったけど、SDKが対応してなさそうなので、Pythonにします

準備

てきとうなバケットを作って、JSONを置いておきます。

name.json
{"id": "16509", "name": "hoge"}
{"id": "6217", "name": "fuga"}
{"id": "1398", "name": "piyo"}

コード

# boto3をインストール
% pip install boto3
s3select.py
# -*- coding: utf-8 -*-

import boto3
s3 = boto3.client('s3')

response = s3.select_object_content(
    Bucket='<BUCKET_NAME>',
    Key='name.json',
    ExpressionType='SQL',
    Expression='select s.id, s.name from S3Object s',
    InputSerialization={
        'CompressionType': 'NONE',
        'JSON': {
            'Type':'LINES'
        }
    },
    OutputSerialization={
        'JSON': {
            'RecordDelimiter': '\n',
        }
    }
)

for event in response['Payload']:
    if 'Records' in event:
        print(event['Records']['Payload'].decode('utf-8'))

% python s3select.py
{"id":"16509","name":"hoge"}
{"id":"6217","name":"fuga"}
{"id":"1398","name":"piyo"}

Expression のSQLだけ変えてもう少し試します

一部の列だけ持ってくる

Expression='select s.name from S3Object s',
% python ./s3select.py
{"name":"hoge"}
{"name":"fuga"}
{"name":"piyo"}

group byはできないけど、

Expression='select s.name, sum(cast(s.id as int)), count(s.id) from S3Object s group by s.name',
% python s3select.py
Traceback (most recent call last):
  File "s3select.py", line 19, in <module>
    'RecordDelimiter': '\n',
  File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 314, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/botocore/client.py", line 612, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (UnsupportedSqlOperation) when calling the SelectObjectContent operation: Unsupported SQL operation GROUP BY. Please check the service documentation for supported operations.

ちょっとした集計くらいはできる

idsum するというよくわからないことをしているところには目をつむる)

Expression='select sum(cast(s.id as int)), count(s.id) from S3Object s',
% python s3select.py
{"_1":24124,"_2":3}

まとめ

S3 Selectを使うと、selectできて便利っぽいです

5
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
2