1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Python で Amazon S3 バケットの全オブジェクト名を取得するジェネレータ

Last updated at Posted at 2021-02-05

Amazon S3 の ListObjectsV2 API では取得できる件数の上限が 1000 件のため、それ以上のオブジェクトリストを取得したい場合は ContinuationToken を用いて繰り返し API を叩く必要がある。

この「繰り返し ListObjectsV2 API を叩く処理」は別に難しくもなんともないが、汎用的なジェネレータ関数として実装しておくと便利なのでここにメモしておく。

実装

def list_all_objects(s3, **kwargs):
    continuation_token = None
    while True:
        if continuation_token:
            kwargs["ContinuationToken"] = continuation_token
        res = s3.list_objects_v2(**kwargs)
        if "Contents" not in res:
            return
        for obj in res["Contents"]:
            yield obj
        if not res["IsTruncated"]:
            break
        continuation_token = res["NextContinuationToken"]

使用例

import boto3

for obj in list_all_objects(boto3.client("s3"), Bucket="my-bucket", Prefix="foo/"):
    print(obj)
実行結果
{'Key': 'foo/0001.csv.gz', 'LastModified': datetime.datetime(2021, 2, 4, 13, 46, tzinfo=tzlocal()), 'ETag': '"488cda75e864ced2c0afd5529f19a105"', 'Size': 2961, 'StorageClass': 'STANDARD'}
{'Key': 'foo/0002.csv.gz', 'LastModified': datetime.datetime(2021, 2, 4, 13, 46, tzinfo=tzlocal()), 'ETag': '"fa09d5eefae8672c163884f0d82b265f"', 'Size': 5702, 'StorageClass': 'STANDARD'}
{'Key': 'foo/0003.csv.gz', 'LastModified': datetime.datetime(2021, 2, 4, 13, 46, tzinfo=tzlocal()), 'ETag': '"712f063ad327d11de0c20cba27e17bf2"', 'Size': 7407, 'StorageClass': 'STANDARD'}
    :
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?