More than 3 years have passed since last update.

Python で Amazon S3 バケットの全オブジェクト名を取得するジェネレータ

Last updated at 2021-03-09Posted at 2021-02-05

Amazon S3 の ListObjectsV2 API では取得できる件数の上限が 1000 件のため、それ以上のオブジェクトリストを取得したい場合は ContinuationToken を用いて繰り返し API を叩く必要がある。

この「繰り返し ListObjectsV2 API を叩く処理」は別に難しくもなんともないが、汎用的なジェネレータ関数として実装しておくと便利なのでここにメモしておく。

実装

def list_all_objects(s3, **kwargs):
    continuation_token = None
    while True:
        if continuation_token:
            kwargs["ContinuationToken"] = continuation_token
        res = s3.list_objects_v2(**kwargs)
        if "Contents" not in res:
            return
        for obj in res["Contents"]:
            yield obj
        if not res["IsTruncated"]:
            break
        continuation_token = res["NextContinuationToken"]

使用例

import boto3

for obj in list_all_objects(boto3.client("s3"), Bucket="my-bucket", Prefix="foo/"):
    print(obj)

実行結果

{'Key': 'foo/0001.csv.gz', 'LastModified': datetime.datetime(2021, 2, 4, 13, 46, tzinfo=tzlocal()), 'ETag': '"488cda75e864ced2c0afd5529f19a105"', 'Size': 2961, 'StorageClass': 'STANDARD'}
{'Key': 'foo/0002.csv.gz', 'LastModified': datetime.datetime(2021, 2, 4, 13, 46, tzinfo=tzlocal()), 'ETag': '"fa09d5eefae8672c163884f0d82b265f"', 'Size': 5702, 'StorageClass': 'STANDARD'}
{'Key': 'foo/0003.csv.gz', 'LastModified': datetime.datetime(2021, 2, 4, 13, 46, tzinfo=tzlocal()), 'ETag': '"712f063ad327d11de0c20cba27e17bf2"', 'Size': 7407, 'StorageClass': 'STANDARD'}
    :

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up