S3 AnnotationsとS3 Metadataを組み合わせて試してみた

Last updated at 2026-06-29Posted at 2026-06-29

背景・目的

前回、Amazon S3 Annotationsの機能を整理しました。Annotationsはオブジェクトに最大1MBのリッチなデータを後付けできる機能ですが、個々のオブジェクトに付与するだけでは横断検索ができません。

S3 MetadataのAnnotation Tableを有効化すると、付与したAnnotationが自動的にApache Icebergテーブルに格納され、AthenaでSQLクエリが可能になります。
今回は、Annotationの付与からS3 Metadata経由でのAthenaクエリまで、一連の流れを実際に動かして確認します。

まとめ

項目	S3 Annotations	S3 Metadata
役割	オブジェクトにリッチデータを後付けする「書き込み機構」	バケット内のメタデータを自動収集し、Icebergテーブルとして「検索可能にする基盤」
操作	PutObjectAnnotation / GetObjectAnnotation 等の専用API	S3が自動的にテーブルを更新（読み取り専用）
データの流れ	ユーザーがオブジェクトに書き込む	S3が自動的にキャプチャして格納する
関係性	S3 Metadataのデータソースの一つ	Annotationsを含む全メタデータのクエリ基盤
利用サービス	AWS CLI / SDK で直接操作	Athena / Redshift / EMR 等でSQLクエリ

概要

S3のオブジェクトメタデータは4種類ある

Amazon S3 provides several ways to attach metadata to your objects. Annotations let you attach rich data payloads of up to 1 MB to objects using dedicated API operations. Object tags are key-value pairs for categorizing and controlling access to objects. You can also set user-defined metadata at upload time, and Amazon S3 maintains system-defined metadata automatically.

公式ドキュメントにある通り、S3にはオブジェクトにメタデータを付与する方法が複数あります。それぞれ容量や変更可否が異なるため、用途に応じて使い分けます。

The PUT request header is limited to 8 KB in size. Within the PUT request header, the user-defined metadata is limited to 2 KB in size.

You can associate up to 10 tags with an object. A tag key can be up to 128 Unicode characters in length, and tag values can be up to 256 Unicode characters in length.

Each object version supports up to 1,000 annotations. An annotation payload must be between 1 byte and 1 MiB in size. The total annotation storage per object can be up to 1 GiB.

これらをまとめると以下のようになります。

種類	容量	変更可否	用途
System-defined metadata	2KB	S3が自動管理	作成日、サイズ、ストレージクラス、暗号化状態等
User-defined metadata	2KB	アップロード後は不変（変更には再アップロード必要）	`x-amz-meta-*` ヘッダで任意のキーバリューを設定
Object tags	10個（キー128文字+値256文字）	いつでも変更可	IAMポリシー、ライフサイクル、コスト配分と連携
Annotations	1個あたり最大1MB、最大1,000個（合計最大1GB）	いつでも変更可	JSON/XML/YAML等のリッチデータを後付け

S3 Annotationsとは

オブジェクトに名前付きデータペイロードを後からアタッチする機能です。前回の記事で詳しく整理しましたが、ポイントは以下です。

オブジェクトの再アップロード不要で作成・取得・一覧・削除が可能
Glacierでもリストア不要で操作可能
コピー・レプリケーション時に自動伝播
専用API（PutObjectAnnotation等）で操作

S3 Metadataとは

Amazon S3 Metadata accelerates data discovery by automatically capturing metadata for objects in your general purpose buckets and storing it in read-only, fully managed Apache Iceberg tables that you can query.

S3 Metadataは、バケット内のオブジェクトメタデータ（4種類すべて）を自動的にキャプチャし、読み取り専用のApache Icebergテーブルに格納する機能です。ユーザーが直接データを書き込む機能ではなく、S3が自動的にテーブルを更新します。

S3 Metadataの3種類のテーブル

By default, S3 Metadata provides three types of metadata: System-defined metadata, such as an object's creation time and storage class. Custom object metadata, such as tags, annotations, and user-defined metadata that was included during object upload. Event metadata, such as when an object is updated or deleted, and the AWS account that made the request.

Journal table – captures events that occur for the objects in your bucket. The journal table records changes made to your data in near real time.

Live inventory table – provides a simple, queryable inventory of all the objects and their versions in your bucket so that you can determine the latest state of your data.

Annotation table – tracks the latest annotations on the objects in your bucket and makes annotation content directly queryable.

S3 Metadata設定で利用できるテーブルは以下の3種類です。

テーブル	内容	用途
Journal Table	オブジェクトの変更イベント（追加・更新・削除）をほぼリアルタイムで記録	変更追跡、イベント駆動ワークフロー
Live Inventory Table	バケット内の全オブジェクトと最新状態を保持	オブジェクト検索、ストレージ分析
Annotation Table	Annotationsの最新状態を格納	Annotationの横断検索・クエリ

S3 Annotations / S3 Metadata / S3 Tables の関係

You can enable an annotation table as part of your S3 Metadata configuration to query annotation data at scale using Athena and other analytics services. S3 Metadata stores annotation data in fully managed Apache Iceberg tables that Amazon S3 automatically keeps up to date.

Your metadata tables are stored in an AWS managed S3 table bucket, which provides storage that's optimized for tabular data.

AWS managed table buckets are AWS managed resources that automatically store tables created by AWS services, such as the live inventory and journal tables created by S3 Metadata.

S3 Annotations、S3 Metadata、S3 Tablesはそれぞれ別の機能ですが、以下のように連携しています。

概念	役割
S3 Annotations	オブジェクトにリッチデータを付与する「書き込みAPI」
S3 Metadata	メタデータを自動収集してIcebergテーブルにする「パイプライン」
S3 Tables	Icebergテーブルを格納する「ストレージエンジン」

全体の流れを図にすると以下のようになります。

[ユーザー]
    │
    │ PutObjectAnnotation API
    ▼
[S3 オブジェクト] ← Annotation が紐づく
    │
    │ S3 Metadata が自動キャプチャ
    ▼
[S3 Tables (マネージドテーブルバケット: aws-s3)]
    ├── Journal Table（変更イベント）
    ├── Live Inventory Table（最新状態）
    └── Annotation Table（Annotation内容）
    │
    │ Glue カタログ統合 (s3tablescatalog)
    ▼
[Athena / Redshift / EMR]

S3 Metadataが作成するAWSマネージドのテーブルバケットは読み取り専用ですが、S3 Tables自体はユーザーが独自にテーブルバケットを作成して読み書きできる独立した機能です。

AWS managed table buckets are AWS managed resources that automatically store tables created by AWS services. You have read-only access to query the data, while AWS handles all table creation, updates, and maintenance operations.

Customer-managed table buckets are resources for storing Amazon S3 Tables created and managed by customers. You create these buckets explicitly, choose their names, and maintain full control over the tables and namespaces within them.

今回の文脈では、S3 Metadataの転記先として自動管理されている点だけ意識すれば十分です。ユーザーが意識するのは、AthenaからアクセスするためのGlueカタログ統合（glue create-catalog）の部分です。

データの所在としては以下のようになります。

汎用バケット (データの実体)          S3 Tables (検索用)           Athena (クエリ)
┌────────────────┐              ┌──────────────┐           ┌──────────┐
│ オブジェクト本体 │   S3 Metadata  │ Icebergテーブル │  Glueカタログ │  SELECT  │
│ + Annotation   │ ──自動転記──→ │ (読み取り専用) │ ──経由──→  │  * FROM  │
└────────────────┘              └──────────────┘           └──────────┘

Annotationの実体は常に汎用バケット側にあり、S3 Tablesにあるのはクエリ用の読み取り専用コピーです。

Annotations vs Object Tags の使い分け

Choose annotations when you need to store structured data (such as JSON or XML), payloads larger than 256 characters, or more than 10 metadata entries per object. Choose object tags when you need IAM policy integration, Amazon S3 Lifecycle rule filtering, or cost allocation reporting.

観点	Object Tags	Annotations
最大数	10個	1,000個
最大サイズ	キー128文字 + 値256文字	名前512バイト + ペイロード1MB
データ形式	キーバリュー文字列	任意UTF-8テキスト（JSON, XML, YAML等）
アップロード時に設定	可能	不可（後付けのみ）
IAMポリシー連携	あり	なし
ライフサイクル連携	あり	なし
コスト配分	あり	なし

実践

実際にAnnotationの付与からS3 Metadataでのクエリまでを試します。

前提

AWS CLI（2.35 以上）が設定済みであること
テスト用のS3バケットがあること（以下では ${BUCKET_NAME} で表記）

カタログの登録

カタログ一覧を確認します。まだ S3 Tablesカタログを登録はありません

aws athena list-data-catalogs

{
    "DataCatalogsSummary": [
        {
            "CatalogName": "AwsDataCatalog",
            "Type": "GLUE",
            "Status": "CREATE_COMPLETE"
        }
    ]
}

S3 TablesとAthenaを連携するために、Glue Data Catalog統合（SageMaker Lakehouse連携）を有効化する必要があります。まずは、テーブルバケットの詳細と統合状態を確認

aws s3tables get-table-bucket --table-bucket-arn arn:aws:s3tables:ap-northeast-1:XXXXXXXX:bucket/aws-s3

{
    "arn": "arn:aws:s3tables:ap-northeast-1:XXXXXXXX:bucket/aws-s3",
    "name": "aws-s3",
    "ownerAccountId": "XXXXXXXX",
    "createdAt": "2026-06-28T13:40:09.914156+00:00",
    "tableBucketId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX",
    "type": "aws"
}

GlueにS3 Tables用フェデレーテッドカタログを作成します

aws glue create-catalog \
  --name "s3tablescatalog" \
  --catalog-input '{
    "Description": "Federated catalog for S3 Tables",
    "FederatedCatalog": {
      "Identifier": "arn:aws:s3tables:ap-northeast-1:<アカウントID>:bucket/*",
      "ConnectionName": "aws:s3tables"
    },
    "CreateDatabaseDefaultPermissions": [{
      "Principal": {
        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
      },
      "Permissions": ["ALL"]
    }],
    "CreateTableDefaultPermissions": [{
      "Principal": {
        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
      },
      "Permissions": ["ALL"]
    }]
  }'

s3tablescatalog配下の子カタログを確認します

aws glue get-catalogs --parent-catalog-id s3tablescatalog

{
    "CatalogList": [
        {
            "CatalogId": "XXXXXXXX:s3tablescatalog/aws-s3",
            "Name": "aws-s3",
            "ResourceArn": "arn:aws:glue:ap-northeast-1:XXXXXXXX:catalog/s3tablescatalog/aws-s3",
            "CreateTime": "2026-06-28T22:40:09.914000+09:00",
            "FederatedCatalog": {
                "Identifier": "arn:aws:s3tables:ap-northeast-1:XXXXXXXX:bucket/*",
                "ConnectionName": "aws:s3tables",
                "ConnectionType": "aws:s3tables"
            },
            "CatalogProperties": {},
            "CreateTableDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ],
            "CreateDatabaseDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ]
        }
    ]
}

Step 1: テスト用オブジェクトをアップロード

テスト用のファイルを作成し、S3にアップロードします

echo "This is a test document for annotation demo." > test-doc.txt

aws s3 cp test-doc.txt s3://${BUCKET_NAME}/docs/test-doc.txt

upload: ./test-doc.txt to s3://XXXXXXXX/docs/test-doc.txt

Step 2: Annotationを付与する

JSON形式のAnnotationペイロードを作成します

cat > classification.json << 'EOF'
{
  "category": "technical",
  "confidence": 0.95,
  "model": "claude-v3",
  "tags": ["aws", "s3", "demo"],
  "processed_at": "2026-06-28T12:00:00Z"
}
EOF

PutObjectAnnotation でオブジェクトにAnnotationを付与します

aws s3api put-object-annotation \
  --bucket ${BUCKET_NAME} \
  --key docs/test-doc.txt \
  --annotation-name classification \
  --annotation-payload ./classification.json

{
    "ETag": "\"XXXXXXXXXXXXXXX\"",
    "ChecksumCRC64NVME": "XXXXXXXXXXXXXXX",
    "ChecksumType": "FULL_OBJECT",
    "ServerSideEncryption": "AES256",
    "Key": "docs/test-doc.txt",
    "AnnotationName": "classification"
}

もう1つ別のAnnotationも付与してみます

echo "A test document demonstrating S3 annotation capabilities." > summary.txt

aws s3api put-object-annotation \
  --bucket ${BUCKET_NAME} \
  --key docs/test-doc.txt \
  --annotation-name ai_summary \
  --annotation-payload ./summary.txt

{
    "ETag": "\"XXXXXXXXXXXXXXX\"",
    "ChecksumCRC64NVME": "XXXXXXXXXXXXXXX",
    "ChecksumType": "FULL_OBJECT",
    "ServerSideEncryption": "AES256",
    "Key": "docs/test-doc.txt",
    "AnnotationName": "ai_summary"
}

Step 3: Annotationを取得する

GetObjectAnnotation で特定のAnnotationを取得します

aws s3api get-object-annotation \
  --bucket ${BUCKET_NAME} \
  --key docs/test-doc.txt \
  --annotation-name classification \
  ./output-classification.json

{
    "LastModified": "2026-06-28T13:21:14+00:00",
    "ContentLength": 151,
    "ETag": "\"XXXXXXXXXXXXXXX\"",
    "ChecksumCRC64NVME": "XXXXXXXXXXXXXXX",
    "ChecksumType": "FULL_OBJECT",
    "ServerSideEncryption": "AES256"
}

取得した内容を確認します

cat output-classification.json
{
  "category": "technical",
  "confidence": 0.95,
  "model": "claude-v3",
  "tags": ["aws", "s3", "demo"],
  "processed_at": "2026-06-28T12:00:00Z"
}

Step 4: Annotation一覧を確認する

ListObjectAnnotations でオブジェクトに紐づくAnnotation一覧を取得します。classification（151バイト）と ai_summary（58バイト）の2つが確認できました

aws s3api list-object-annotations \
  --bucket ${BUCKET_NAME} \
  --key docs/test-doc.txt

{
    "Annotations": [
        {
            "AnnotationName": "ai_summary",
            "LastModified": "2026-06-28T13:23:30+00:00",
            "ETag": "\"XXXXXXXXXXXXXXXXXX\"",
            "ChecksumAlgorithm": [
                "CRC64NVME"
            ],
            "Size": 58
        },
        {
            "AnnotationName": "classification",
            "LastModified": "2026-06-28T13:21:14+00:00",
            "ETag": "\"XXXXXXXXXXXXXXXXXX\"",
            "ChecksumAlgorithm": [
                "CRC64NVME"
            ],
            "Size": 151
        }
    ],
    "AnnotationCount": 2,
    "AnnotationPrefix": null,
    "Bucket": "XXXXXXXXXXXXXXXXXX",
    "Key": "docs/test-doc.txt",
    "ObjectVersionId": null,
    "RequestCharged": null
}

Step 5: S3 Metadataを有効化してAthenaでクエリする

5-1. Annotation Table用IAMロールの作成

信頼ポリシーを作成します

cat > trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "metadata.s3.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "<アカウントID>"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:s3:::${BUCKET_NAME}"
        }
      }
    }
  ]
}
EOF

権限ポリシーを作成します

cat > permissions-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObjectAnnotation",
        "s3:GetObjectVersionAnnotation"
      ],
      "Resource": ["arn:aws:s3:::${BUCKET_NAME}/*"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:ListBucketVersions"
      ],
      "Resource": ["arn:aws:s3:::${BUCKET_NAME}"]
    }
  ]
}
EOF

IAMロールを作成し、ポリシーをアタッチします

aws iam create-role \
  --role-name S3MetadataAnnotationRole \
  --assume-role-policy-document file://trust-policy.json
    json

{
   "Role": {
       "RoleName": "S3MetadataAnnotationRole",
       ...
   }
}

 aws iam put-role-policy \
   --role-name S3MetadataAnnotationRole \
   --policy-name S3AnnotationReadPolicy \
   --policy-document file://permissions-policy.json

{
    "Role": {
        "Path": "/",
        "RoleName": "S3MetadataAnnotationRole",
        "RoleId": "XXXXXXXXX",
        "Arn": "arn:aws:iam::XXXXXXXXX:role/S3MetadataAnnotationRole",
        "CreateDate": "2026-06-28T13:35:04+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "metadata.s3.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole",
                    "Condition": {
                        "StringEquals": {
                            "aws:SourceAccount": "XXXXXXXXX"
                        },
                        "ArnLike": {
                            "aws:SourceArn": "arn:aws:s3:::XXXXXXXXX"
                        }
                    }
                }
            ]
        }
    }
}

5-2. S3 Metadata設定でAnnotation Tableを有効化する

バケットに既存のMetadata設定があるか確認します。設定がありませんでした

aws s3api get-bucket-metadata-configuration --bucket ${BUCKET_NAME}

aws: [ERROR]: An error occurred (MetadataConfigurationNotFound) when calling the GetBucketMetadataConfiguration operation: The metadata configuration was not found

設定がないので新規作成します

cat > /tmp/metadata-config.json << 'EOF'
{
  "JournalTableConfiguration": {
    "RecordExpiration": {
      "Expiration": "DISABLED"
    }
  },
  "InventoryTableConfiguration": {
    "ConfigurationState": "DISABLED"
  },
  "AnnotationTableConfiguration": {
    "ConfigurationState": "ENABLED",
    "Role": "arn:aws:iam::XXXXXXXXX:role/S3MetadataAnnotationRole"
  }
}
EOF

メタデータ設定を作成します

aws s3api create-bucket-metadata-configuration --bucket ${BUCKET_NAME}  \
--metadata-configuration file:///tmp/metadata-config.json

設定を確認します。現在の状態は下記のとおりです

Journal Table: CREATING
Annotation Table: CREATING（ENABLED）
テーブルネームスペース: b_XXXXXXXXX

aws s3api get-bucket-metadata-configuration --bucket ${BUCKET_NAME}
{
    "GetBucketMetadataConfigurationResult": {
        "MetadataConfigurationResult": {
            "DestinationResult": {
                "TableBucketType": "aws",
                "TableBucketArn": "arn:aws:s3tables:ap-northeast-1:XXXXXXX:bucket/aws-s3",
                "TableNamespace": "b_XXXXXXXXX"
            },
            "JournalTableConfigurationResult": {
                "TableStatus": "CREATING",
                "TableName": "journal",
                "RecordExpiration": {
                    "Expiration": "DISABLED"
                }
            },
            "InventoryTableConfigurationResult": {
                "ConfigurationState": "DISABLED"
            },
            "AnnotationTableConfigurationResult": {
                "ConfigurationState": "ENABLED",
                "TableStatus": "CREATING",
                "TableName": "annotation",
                "Role": "arn:aws:iam::XXXXXXX:role/S3MetadataAnnotationRole"
            }
        }
    }
}

しばらく待って確認します。20分程度待ちましたがTableStatusが全てActiveになりました

5-3. AthenaでAnnotationをクエリする

Athenaで以下のSQLを実行し、Annotationの内容をクエリします
- "s3tablescatalog/aws-s3" — カタログ名（S3 Tablesのマネージドテーブルバケット）
- "b_{バケット名}" — ネームスペース（バケット名に b_ プレフィックス）
- "annotation" — テーブル名
```
aws athena start-query-execution \
  --query-string "SELECT * FROM \"s3tablescatalog/aws-s3\".\"b_XXXXXXXX\".\"annotation\" LIMIT 5" \
  --result-configuration "OutputLocation=s3://${BUCKET_NAME}/" \
  --work-group primary
```
```
{
    "QueryExecutionId": "XXXXXXXXXXXXXXXXXXXXXXX"
}
```

クエリ実行結果のステータス確認します

aws athena get-query-results --query-execution-id XXXXXXXXXXXXXXXXXXXXXXX

取得した内容をまとめると下記の内容です

カラム	値の例
bucket	バケット名
object_key	docs/test-doc.txt
name	classification / ai_summary
text_value	JSONやテキストのペイロード本文
size	151 / 58
e_tag	AnnotationのETag
last_modified_date	2026-06-28 13:21:14.000000
checksum_algorithm	CRC64NVME

考察

S3 Annotationsは「書き込み」、S3 Metadataは「検索用に転記する機構」、S3 Tablesは「転記先のストレージ」であり、それぞれレイヤーが異なる。Annotationの実体は常に汎用バケット側にあり、S3 Tablesにあるのはクエリ用の読み取り専用コピーである
従来のUser-defined metadata（2KB、不変）やObject Tags（10個、短い文字列）では表現できなかったリッチなコンテキストを、Annotationsが埋めている
S3 Metadataを有効化しないとAnnotationsはオブジェクト単位でしか取得できない。横断検索にはAnnotation Tableの有効化が必須
AthenaからS3 Tablesのテーブルを参照するには、事前に glue create-catalog でフェデレーテッドカタログを登録する必要がある。S3 Metadata設定だけでは自動的にAthenaから見えるようにはならない
Annotation TableのバックフィルにはS3がバケット全体をスキャンするため、オブジェクト数が多いと数時間かかる。検証時は少数のオブジェクトで試すのがよい
Object Tagsは引き続きIAMポリシーやライフサイクルとの連携に使い、Annotationsは大容量のリッチデータ格納に使うという棲み分けが明確
AthenaでJSON内のフィールドを json_extract_scalar で抽出できるため、スキーマ定義なしで柔軟にクエリできる点が実用的
セットアップの手数を考えると、Annotationsが有用なのは大規模でラベルを一括付与し、コピー・レプリケーション時にメタデータの同期を自動化したいユースケースだと感じた

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up