Gakken LEAPAdvent Calendar 2024

ContinuationTokenを使って、LambdaでList取得してみた

Last updated at 2024-12-12Posted at 2024-12-12

はじめに

こんにちは、Gakken LEAP のエンジニアの shogawa です。
最近はインフラ周りを見ることが多くて、Lambdaを使っての自動化も挑戦することが増えてきました。

今回は、S3にアップロードされたtempのデータを日次で削除する必要があったので、
削除のためのLambdaを書いて、EventBridgeで定期実行していたのですが、１週間ほど経ってからデータを確認してみたら消えていないことがあり、調査・対応したことを書いていこうと思います。

前提

AWSの環境が整っていること
削除対象となるS3バケットのディレクトリがあること
※今回はS3のlist取得のAPIでしたが、例えばEC2のlistを取得する際にも同様みたいです。

修正前コード

import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # EventBridgeから渡されるS3バケット名とパスを取得
    bucket = event['bucket']
    path = event['path']

    try:
        # S3バケット内の指定パスからオブジェクト一覧を取得
        response = s3.list_objects_v2(Bucket=bucket, Prefix=path)

        if 'Contents' in response:
            for obj in response['Contents']:
                key = obj['Key']

                # ファイルのみを削除。プレフィックス（ディレクトリ）は残す。
                if key != path and not key.endswith('/'):
                    print(f"Deleting {key}")
                    s3.delete_object(Bucket=bucket, Key=key)

        return {
            'statusCode': 200,
            'body': f"Successfully deleted files in {path}"
        }
    except Exception as e:
        print(e)
        return {
            'statusCode': 500,
            'body': str(e)
        }

修正前コードでは、以下の部分で指定したS3のバケットのデータを取得して、
それを削除できると思っていました。

response = s3.list_objects_v2(Bucket=bucket, Prefix=path)

しかし、s3.list_objectsに限らず、AWSのlist取得系APIは上限が定められているとのことです。（今回の環境では100件しか取得できていませんでした。）
確かに、どれだけ取得件数があるのかもわからないのに全件取得するわけないですよね…。

修正後コード

全件取得したい場合には、ContinuationTokenを利用して繰り返し取得するようにする必要がありました。

import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # EventBridgeから渡されるS3バケット名とパスを取得
    bucket = event['bucket']
    path = event['path']

    try:
        continuation_token = None
        while True:
            # ContinuationTokenを使ってページネーションでオブジェクトを取得
            if continuation_token:
                response = s3.list_objects_v2(
                    Bucket=bucket,
                    Prefix=path,
                    ContinuationToken=continuation_token
                )
            else:
                response = s3.list_objects_v2(
                    Bucket=bucket,
                    Prefix=path
                )

            if 'Contents' in response:
                for obj in response['Contents']:
                    key = obj['Key']

                    # ファイルのみを削除。プレフィックス（ディレクトリ）は残す。
                    if key != path and not key.endswith('/'):
                        print(f"Deleting {key}")
                        s3.delete_object(Bucket=bucket, Key=key)

            # 次のページがあるかどうか確認
            if response.get('IsTruncated'):
                continuation_token = response.get('NextContinuationToken')
            else:
                break

        return {
            'statusCode': 200,
            'body': f"Successfully deleted files in {path}"
        }
    except Exception as e:
        print(e)
        return {
            'statusCode': 500,
            'body': str(e)
        }

以下のコードにより、1回目は普通に取得して、2回目以降はContinuationToken=continuation_tokenにより次の100件の取得をするようにしています。

            # ContinuationTokenを使ってページネーションでオブジェクトを取得
            if continuation_token:
                response = s3.list_objects_v2(
                    Bucket=bucket,
                    Prefix=path,
                    ContinuationToken=continuation_token
                )
            else:
                response = s3.list_objects_v2(
                    Bucket=bucket,
                    Prefix=path
                )

もっと便利に

この記事を書きながら、APIの仕様やらを再確認していたのですが、
わざわざContinuationTokenを意識しなくても、全件取得用のAPI（paginateListObjectsV2）も用意されていました。

参考：
https://docs.aws.amazon.com/ja_jp/AmazonS3/latest/userguide/example_s3_Scenario_DeleteAllObjects_section.html

うーん、便利！いずれリファクタリングもしていきます。

エンジニア募集中

Gakken LEAP では教育をアップデートしていきたいエンジニアを絶賛大募集しています！！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up