4
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Icebergの初歩をやってみる。

Posted at

icebergのデータを作ってS3にアップロードglueにデータを入れてAthenaで検索できるところまでもっていきます。

S3パケット作成

aws s3 mb s3://my-iceberg-tokyo-20251206/athena-results/ --region ap-northeast-1

Glueのデータベース作成

aws glue create-database \
    --database-input '{"Name": "default"}' \
    --region ap-northeast-1
.env
AWS_ACCESS_KEY_ID=AKIXXXXXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_REGION=ap-northeast-1
ICEBERG_BUCKET=my-iceberg-tokyo-20251206    # ← ここを新しい東京バケット名に変更!
GLUE_DATABASE=default
from dotenv import load_dotenv
load_dotenv()

import os
import pandas as pd
from pyiceberg.catalog import load_catalog
import pyarrow as pa

# ------------------- 設定 -------------------
bucket = os.getenv("ICEBERG_BUCKET")
database = os.getenv("GLUE_DATABASE", "default")

catalog = load_catalog(
    "glue",
    **{
        "type": "glue",
        "warehouse": f"s3://{bucket}/warehouse/",
        "s3.region": "ap-northeast-1",
    }
)

# テーブルはもうあるので load
table = catalog.load_table(f"{database}.sample_table")

# ------------------- データ追加(ここだけ毎回変える)-------------------
df = pd.DataFrame({
    "id": [100, 101, 102, 103],
    "name": ["青森", "秋田", "山形", "岩手"],
    "amount": [999999.0, 888888.0, 777777.0, 666666.0],
    "created_at": pd.to_datetime(["2025-12-11", "2025-12-11", "2025-12-12", "2025-12-12"])
})

# ここが神3行(これで required/optional + timestamp[us] 完全解決)
df["created_at"] = df["created_at"].dt.tz_localize(None).astype("datetime64[us]")
df = df.astype({"id": "int64"})  # id を int64 に

# ここが最重要!PyArrow に「required」を強制指定
arrow_table = pa.Table.from_pandas(df).cast(
    target_schema=pa.schema([
        pa.field("id", pa.int64(), nullable=False),           # required long
        pa.field("name", pa.string(), nullable=True),         # optional string
        pa.field("amount", pa.float64(), nullable=True),      # optional double
        pa.field("created_at", pa.timestamp('us'), nullable=False)  # required timestamp(us)
    ])
)

# 書き込み!
table.append(arrow_table)

print("完全無欠・永遠の成功!!!")
print(f"現在の行数: {table.scan().to_arrow().num_rows} 行になりました!")
print("Athena で SELECT * FROM default.sample_table;")
print("もう二度とエラーは出ません!!!")

4
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?