0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Reading Databricks Tables from BigQuery via UniForm and Cross-cloud Lakehouse Federation / UniFormとGoogle Cloud Lakehouseフェデレーションを介してBigQueryからDatabricksテーブルを読み込む

0
Last updated at Posted at 2026-04-30

Reading Databricks Tables from BigQuery via UniForm and Cross-cloud Lakehouse Federation / UniFormとGoogle Cloud Lakehouseフェデレーションを介してBigQueryからDatabricksテーブルを読み込む

Verified end-to-end on 2026-04-30. Hands-on test summary; English first, Japanese below.

2026年4月30日にエンドツーエンドで検証済み。実地テストの概要(英語が先、日本語は後述)。


English

Why — two paths to "BigQuery reads Databricks tables"

For customers running primary data engineering on Databricks but with BigQuery still in the consumer/serving path (a common state during BQ-to-Databricks migrations), there are two architectural approaches to let BigQuery query Databricks-managed tables directly. They look similar at first glance — both produce "BQ can SELECT from a Databricks table" — but they differ meaningfully in how they scale.

Path (A): BigLake external table on Delta path

  • BigQuery side creates a BigLake external table pointing at the Delta directory in cloud storage
  • BigLake natively understands the Delta transaction log and serves the current snapshot
  • Setup is per-table — each table gets its own BigLake external table, refreshed via scripts
  • GA, well-documented, used in production at many sites
  • Brittle along two axes: scales O(N) with table count (each table needs its own pipeline), and constrained to classic Delta checkpoints (BigLake doesn't support V2 checkpoint, which newer Delta features increasingly require)

Path (C): BigQuery Cross-cloud Lakehouse Federation → Unity Catalog

  • BigQuery side mounts the entire Databricks Unity Catalog as a foreign catalog via the Iceberg REST Catalog (IRC) protocol
  • BigLake polls UC's IRC endpoint on a schedule and exposes everything it sees as queryable from BigQuery
  • One-time setup at the catalog level — adding tables on the Databricks side automatically becomes queryable from BigQuery on the next refresh cycle
  • Public Preview, requires gcloud 566.0.0+ (released 2026-04-24)
  • Resilient to Delta protocol evolution: as long as UniForm's Iceberg metadata generation continues working, BigLake doesn't care which Delta features you enable
Dimension (A) BigLake on Delta (C) Cross-cloud Lakehouse Federation
Direction BQ reads DBX BQ reads DBX
Setup granularity Per-table Per-catalog
Discovery Manual / scripted Automatic (refresh interval ≥ 5 min)
Delta-side write capability Read-only from BQ Read-only from BQ
Underlying protocol BigLake's native Delta reader Iceberg REST (IRC)
Maturity GA Public Preview
Resilience to Delta protocol changes Limited (V2 checkpoint not supported) High (UniForm absorbs it)
BigQuery UI navigator support Yes No (4-part path only)

The choice between (A) and (C) depends on the customer's specific constraints — table count, tolerance for Public Preview status, importance of newer Delta features that require V2 checkpoint, operational ownership of the BQ-side discovery scripts, etc. Both work today; the comparison above lays out the trade-offs without prescribing one.

What — Managed UniForm Delta tables (for Path C)

The remainder of this article focuses on Path (C). Path (A) is well-documented in the existing BigLake-on-Delta materials; we don't repeat it here.

Path (C) requires the Databricks-side tables to be Iceberg-readable. Three options exist:

  1. Native Managed Iceberg — UC creates Iceberg directly. Loses CDF, Z-order, generated columns, constraints, predictive optimization. Public Preview only.
  2. External Delta + UniForm — Delta table created with explicit LOCATION, with UniForm enabled. Works.
  3. Managed Delta + UniForm — UC-managed Delta table, with UniForm enabled. Storage layout managed by UC. Recommended.

The third is the cleanest path: you keep all Delta features (CDF, Z-order, predictive optimization, generated columns, constraints), get UC to handle storage lifecycle, and gain the BigQuery interop on top via a single TBLPROPERTIES change.

Critical compatibility caveat: UniForm/IcebergCompatV2 is mutually exclusive with delta.enableDeletionVectors=true. If your Delta tables use deletion vectors today, you must either disable DV (and run REORG TABLE … APPLY (PURGE) to clean up) or maintain a UniForm-disabled copy for tables that need DV. Deletion vectors are a real Delta optimization for MERGE/UPDATE/DELETE-heavy workloads; weigh the trade-off honestly.

Other prerequisites on the Delta-side:

  • delta.minReaderVersion ≥ 2, delta.minWriterVersion ≥ 7 (Databricks may auto-set minReaderVersion=2 even if you request 3)
  • delta.columnMapping.mode='name'
  • delta.enableIcebergCompatV2='true'
  • delta.universalFormat.enabledFormats='iceberg'

How — verified end-to-end setup for Path C

Setup steps below are specifically for Path (C). For Path (A), refer to the BigLake-on-Delta documentation referenced at the end.

Below is the actual sequence that worked on 2026-04-30. Test environment: Databricks workspace my-awsome-workspace (AWS, ap-northeast-1), GCP project my-awsome-bq-project, federated catalog created in us-east4.

Step 1 — Create the UniForm-enabled Delta table on Databricks

-- Create schema if it doesn't exist (skip if already provisioned)
CREATE SCHEMA IF NOT EXISTS wanyu_bq_test2.test;

USE CATALOG wanyu_bq_test2;
USE SCHEMA test;

CREATE OR REPLACE TABLE t_inventory_uniform (
  inventory_id BIGINT,
  sku STRING,
  region STRING,
  qty INT,
  classification STRING,
  updated_at TIMESTAMP
)
USING DELTA
TBLPROPERTIES (
  'delta.columnMapping.mode' = 'name',
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '7',
  'delta.enableDeletionVectors' = 'false',
  'delta.checkpointPolicy' = 'classic',
  'delta.checkpoint.writeStatsAsStruct' = 'true',
  'delta.enableChangeDataFeed' = 'true',
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg'
);

INSERT INTO t_inventory_uniform VALUES
  (1, 'SKU-001', 'JP', 100, 'public',     current_timestamp()),
  (2, 'SKU-002', 'KR',  50, 'public',     current_timestamp()),
  (3, 'SKU-003', 'TW',  75, 'restricted', current_timestamp()),
  (4, 'SKU-004', 'HK',  30, 'public',     current_timestamp()),
  (5, 'SKU-005', 'IN', 200, 'public',     current_timestamp());

Prerequisites:

  • The catalog must use external location storage (verify with databricks catalogs get <name>storage_root should be a s3:// / gs:// / abfss:// path). Default-storage UC catalogs are not supported by Cross-cloud Lakehouse Federation.
  • The metastore must have external_access_enabled=true (verify with databricks metastores summary).
  • The user/cluster running this CREATE must have CREATE TABLE privilege on the schema and the storage credential bound to the catalog must be assumable by the SQL warehouse's runtime IAM (see Gotcha #8).

Step 2 — Verify the table is exposed via UC's Iceberg REST endpoint

TOKEN=$(databricks auth token --profile my-awsome-workspace \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['access_token'])")

curl -s "https://my-awsome-workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest/v1/catalogs/wanyu_bq_test2/namespaces/test/tables/t_inventory_uniform" \
  -H "Authorization: Bearer $TOKEN" \
  | python3 -m json.tool | head -30

Expected response: a JSON document with metadata-location, metadata.format-version: 2, full schema, and Iceberg-style fields (long, string, int, timestamptz). This confirms the IRC endpoint is serving the UniForm-Delta table as Iceberg v2.

Step 3 — Create a Databricks Service Principal and grant it access

# Create SP via SCIM API
databricks api post --profile my-awsome-workspace \
  /api/2.0/preview/scim/v2/ServicePrincipals \
  --json '{"displayName": "bq-uniform-federation-sp", "active": true}'

# Note the applicationId (= OAuth client_id) from the response

# Grant catalog-level privileges including EXTERNAL_USE_SCHEMA
databricks grants update --profile my-awsome-workspace \
  catalog wanyu_bq_test2 \
  --json '{
    "changes": [{
      "principal": "<sp-application-id>",
      "add": ["USE_CATALOG", "USE_SCHEMA", "SELECT", "EXTERNAL_USE_SCHEMA"]
    }]
  }'

EXTERNAL_USE_SCHEMA is the privilege that authorizes IRC clients to retrieve table metadata. Without it, the SP can list namespaces but loadTable returns an error.

Step 4 — Generate the OAuth client secret (UI step)

This is the only step that cannot be done programmatically from a workspace token. In the Databricks Account Console (accounts.cloud.databricks.com):

  1. User management → Service principals → bq-uniform-federation-sp
  2. Secrets tab → Generate secret
  3. Save the secret (shown once)

Verify the OAuth flow works:

CLIENT_ID="<sp-application-id>"
CLIENT_SECRET="<from-account-console>"
HOST="my-awsome-workspace.cloud.databricks.com"

curl -s -u "$CLIENT_ID:$CLIENT_SECRET" \
  -X POST "https://$HOST/oidc/v1/token" \
  -d "grant_type=client_credentials&scope=all-apis"
# Should return JSON with access_token, expires_in:3600, scope:all-apis

Step 5 — Store the secret in regional Secret Manager as JSON

This is where two non-obvious gotchas combine: the secret must be regional (not global), and the value must be a JSON document containing both client_id and client_secret (not just the secret string alone).

QP="my-awsome-bq-project"
LOCATION="us-east4"
SECRET_NAME="bq-uniform-test2-dbx-oauth"

# Set the regional Secret Manager API endpoint (required — gcloud's standard
# secret commands don't accept --location for regional secrets)
gcloud config set api_endpoint_overrides/secretmanager \
  https://secretmanager.${LOCATION}.rep.googleapis.com/

# Create the secret resource
gcloud secrets create $SECRET_NAME --project=$QP --location=$LOCATION

# Add a version with JSON payload (NOT just the bare secret string)
echo "{\"client_id\":\"${CLIENT_ID}\",\"client_secret\":\"${CLIENT_SECRET}\"}" \
  | gcloud secrets versions add $SECRET_NAME \
      --project=$QP --location=$LOCATION --data-file=-

# Reset endpoint when done
gcloud config unset api_endpoint_overrides/secretmanager

Storing only the bare client secret (the most natural interpretation of the docs) causes the federation refresh to fail with "Failed to fetch credentials for Unity Catalog refresh" — a misleading error that initially looks like an IAM problem.

Step 6 — Create the BigLake federated catalog

Requires gcloud 566.0.0 or later (released 2026-04-24). Older versions silently reject --catalog-type=federated with Invalid choice: 'federated'.

gcloud alpha biglake iceberg catalogs create wanyu_bq_test2_fed \
  --project=$QP \
  --primary-location=$LOCATION \
  --catalog-type=federated \
  --federated-catalog-type=unity \
  --secret-name=projects/$QP/locations/$LOCATION/secrets/$SECRET_NAME \
  --unity-instance-name=my-awsome-workspace.cloud.databricks.com \
  --unity-catalog-name=wanyu_bq_test2 \
  --refresh-interval=300s

This returns the BigLake-managed service account that the federation will use to access the secret. Note it for the next step.

BIGLAKE_SA=$(gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
  --project=$QP \
  --format='value(biglake-service-account)')
echo "BigLake SA: $BIGLAKE_SA"
# Example: blirc-83839625123-7dh1@gcp-sa-biglakerestcatalog.iam.gserviceaccount.com

Step 7 — Grant the BigLake SA access to the regional secret

gcloud config set api_endpoint_overrides/secretmanager \
  https://secretmanager.${LOCATION}.rep.googleapis.com/

gcloud secrets add-iam-policy-binding $SECRET_NAME \
  --project=$QP \
  --location=$LOCATION \
  --member="serviceAccount:$BIGLAKE_SA" \
  --role=roles/secretmanager.secretAccessor

gcloud config unset api_endpoint_overrides/secretmanager

Step 8 — Wait for first refresh and query from BigQuery

The first refresh fires after the refresh interval (so 5 min after catalog creation, not at create time). Watch for completion:

gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
  --project=$QP --format=json \
  | python3 -c "
import json, sys
d=json.load(sys.stdin)
rs=d.get('federated-catalog-options',{}).get('refresh-status',{})
print('end:', rs.get('end-time'))
print('code:', rs.get('status',{}).get('code', '0=ok'))
print('msg:', rs.get('status',{}).get('message', '(empty=success)'))
"

code=0 (or empty) indicates success.

Then query from BigQuery using the 4-part path project.federated_catalog.namespace.table:

bq --project_id=$QP --location=$LOCATION query --use_legacy_sql=false \
  'SELECT inventory_id, sku, region, qty, classification, updated_at
   FROM `my-awsome-bq-project.wanyu_bq_test2_fed.test.t_inventory_uniform`
   ORDER BY inventory_id'

In the BigQuery Web UI: open the SQL editor, set query location to us-east4 (Settings → Query settings → Data location), then paste the same query. The federated catalog will not appear in the navigator panel — this is a documented limitation; you must reference it by full path.

Result for the test:

+---+---------+--------+-----+-------------+-------------------+
| 1 | SKU-001 | JP     | 100 | public      | 2026-04-30 13:27  |
| 2 | SKU-002 | KR     |  50 | public      | 2026-04-30 13:27  |
| 3 | SKU-003 | TW     |  75 | restricted  | 2026-04-30 13:27  |
| 4 | SKU-004 | HK     |  30 | public      | 2026-04-30 13:27  |
| 5 | SKU-005 | IN     | 200 | public      | 2026-04-30 13:27  |
+---+---------+--------+-----+-------------+-------------------+

End-to-end confirmed: BQ Cross-cloud Lakehouse FederationUC Iceberg REST endpointUniForm-enabled Managed Delta table.

Cleanup (when re-running or removing the test)

# 1. Delete the federated catalog
gcloud alpha biglake iceberg catalogs delete wanyu_bq_test2_fed \
  --project=$QP --quiet

# 2. Delete the regional secret (use regional endpoint override)
gcloud config set api_endpoint_overrides/secretmanager \
  https://secretmanager.${LOCATION}.rep.googleapis.com/

gcloud secrets delete bq-uniform-test2-dbx-oauth \
  --project=$QP --location=$LOCATION --quiet

gcloud config unset api_endpoint_overrides/secretmanager

# 3. (Optional) Drop the Databricks-side resources
# DROP TABLE wanyu_bq_test2.test.t_inventory_uniform;
# DROP SCHEMA wanyu_bq_test2.test CASCADE;
# Account console: delete the SP if no longer needed

Gotchas, in priority order

  1. Update gcloud first to 566.0.0+ (released 2026-04-24). The --catalog-type=federated flag does not exist in older versions.
  2. Secret value is JSON, not the bare secret string. Use {"client_id":"…","client_secret":"…"}.
  3. Secret must be regional. Use the regional API endpoint override; standard gcloud secrets create --location does not work the way you'd expect.
  4. Grant EXTERNAL_USE_SCHEMA to the SP at the catalog level, not just SELECT. Without it, federation sees zero tables.
  5. UniForm + delta.enableDeletionVectors=true are incompatible. Disable DV (and run REORG TABLE … APPLY (PURGE) to clean up existing DV files) before enabling UniForm.
  6. First refresh takes 5+ minutes. No "force refresh" command exists — only delete + recreate. Plan for the lag.
  7. BigQuery UI navigator does not show federated catalogs. Reference by 4-part path. Set query location to the catalog's region.
  8. Storage credential resolution at runtime can fail (credentialName = None) even when grants look correct. The catalog's storage credential must be one your SQL warehouses' runtime IAM can assume. Verify before designing for it.

Reference URLs


日本語

Why(なぜ)— BigQuery から Databricks テーブルを参照する2つの方法

データエンジニアリングを Databricks 中心で行いつつ、BigQuery を消費・サービング層に残しているお客様(BQ→Databricks 移行中によくある状態)にとって、BigQuery から Databricks 管理テーブルを直接参照する方法は2つあります。表面的には同じ「BQ から SELECT できる」ですが、スケール特性が大きく異なります。

(A) BigLake 外部テーブル on Delta パス

  • BQ 側で BigLake 外部テーブルを作成し、Delta ディレクトリを指定
  • BigLake は Delta トランザクションログを直接解釈し、現スナップショットを返す
  • セットアップはテーブル単位 — 1テーブルごとに BigLake 外部テーブルを作成し、スクリプトで再生成
  • GA、ドキュメント整備済、本番運用実績あり
  • 2軸で脆弱性あり:テーブル数 N に対して O(N) で運用コストが膨らむ/Classic checkpoint 限定(V2 checkpoint は BigLake 非対応で、新しい Delta 機能ほど V2 を要求するようになっている)

(C) BigQuery Cross-cloud Lakehouse Federation → Unity Catalog

  • BQ 側が Unity Catalog 全体を foreign catalog としてマウント、Iceberg REST Catalog (IRC) プロトコルで通信
  • BigLake が UC の IRC エンドポイントを定期ポーリングし、見つかったテーブルを BQ から参照可能にする
  • セットアップはカタログ単位で1回 — Databricks 側でテーブルを追加すれば、次の refresh サイクルで自動的に BQ から見えるようになる
  • Public Preview、gcloud 566.0.0+ が必要(2026-04-24 リリース)
  • Delta プロトコル進化に強い:UniForm の Iceberg メタデータ生成が動いていれば、BigLake 側はどの Delta 機能を使っても透過
観点 (A) BigLake on Delta (C) Cross-cloud Lakehouse Federation
方向 BQ → DBX 参照 BQ → DBX 参照
セットアップ粒度 テーブル単位 カタログ単位
探索 手動/スクリプト 自動(refresh 間隔 ≥ 5分)
Delta 側書込み(BQ から) 不可 不可
プロトコル BigLake ネイティブ Delta reader Iceberg REST (IRC)
成熟度 GA Public Preview
Delta プロトコル変更耐性 限定的(V2 checkpoint 非対応) 高(UniForm が吸収)
BigQuery UI ナビゲータ表示 あり なし(4-part path 指定のみ)

(A) と (C) のどちらを選ぶかは、お客様固有の制約(テーブル数、Public Preview ステータスへの許容度、V2 checkpoint を要求する新しい Delta 機能の重要度、BQ 側 discovery スクリプトの運用負荷など)に依存します。両方とも今日動作するため、上記の比較は一方を推奨せず、トレードオフを並べた形にしています。

What(何を)— Managed UniForm Delta テーブル(Path C 用)

本記事の残りは (C) に焦点を当てます。(A) は既存の BigLake-on-Delta 資料で十分整備されているため、ここでは繰り返しません。

(C) を機能させるには、Databricks 側のテーブルが Iceberg 互換である必要があります。3つの選択肢:

  1. Native Managed Iceberg — UC が直接 Iceberg として作成。CDF / Z-order / 生成列 / 制約 / Predictive Optimization を失う。Public Preview のみ。
  2. External Delta + UniFormLOCATION 明示で作成した Delta テーブル + UniForm 有効化。動作する。
  3. Managed Delta + UniForm — UC 管理の Delta テーブル + UniForm 有効化。ストレージレイアウトは UC が管理。推奨

3番目が最もクリーン:Delta 機能(CDF / Z-order / Predictive Optimization / 生成列 / 制約)を全て維持しつつ、UC にストレージライフサイクルを任せ、TBLPROPERTIES 1つで BigQuery 連携を獲得。

重要な互換性注意点: UniForm/IcebergCompatV2 は delta.enableDeletionVectors=true と排他的です。既存テーブルが deletion vectors を使っている場合、無効化(と REORG TABLE … APPLY (PURGE) での既存 DV ファイルクリーンアップ)が必要、または DV が必要なテーブル用に UniForm 無効版を別途維持する必要があります。Deletion vectors は MERGE/UPDATE/DELETE 多用ワークロードで実際の最適化機能なので、トレードオフは正直に評価する必要があります。

Delta 側のその他の前提条件:

  • delta.minReaderVersion ≥ 2, delta.minWriterVersion ≥ 7 (3 を指定しても Databricks が自動的に 2 に設定する場合あり)
  • delta.columnMapping.mode='name'
  • delta.enableIcebergCompatV2='true'
  • delta.universalFormat.enabledFormats='iceberg'

How(どうやって)— Path C の検証済みエンドツーエンドセットアップ

以下のセットアップ手順は (C) 専用です。(A) については末尾参照の BigLake-on-Delta ドキュメントを参照してください。

以下は 2026-04-30 に実際に動作したコマンド列です。テスト環境:Databricks ワークスペース my-awsome-workspace(AWS, ap-northeast-1)、GCP プロジェクト my-awsome-bq-project、federated catalog は us-east4

Step 1 — Databricks 側で UniForm 有効化 Delta テーブル作成

-- スキーマが無ければ作成(既にプロビジョン済みならスキップ)
CREATE SCHEMA IF NOT EXISTS wanyu_bq_test2.test;

USE CATALOG wanyu_bq_test2;
USE SCHEMA test;

CREATE OR REPLACE TABLE t_inventory_uniform (
  inventory_id BIGINT,
  sku STRING,
  region STRING,
  qty INT,
  classification STRING,
  updated_at TIMESTAMP
)
USING DELTA
TBLPROPERTIES (
  'delta.columnMapping.mode' = 'name',
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '7',
  'delta.enableDeletionVectors' = 'false',
  'delta.checkpointPolicy' = 'classic',
  'delta.checkpoint.writeStatsAsStruct' = 'true',
  'delta.enableChangeDataFeed' = 'true',
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg'
);

INSERT INTO t_inventory_uniform VALUES
  (1, 'SKU-001', 'JP', 100, 'public',     current_timestamp()),
  (2, 'SKU-002', 'KR',  50, 'public',     current_timestamp()),
  (3, 'SKU-003', 'TW',  75, 'restricted', current_timestamp()),
  (4, 'SKU-004', 'HK',  30, 'public',     current_timestamp()),
  (5, 'SKU-005', 'IN', 200, 'public',     current_timestamp());

前提条件:

  • カタログが external location 構成であること(databricks catalogs get <name>storage_roots3:// / gs:// / abfss:// パスであることを確認)。Default-storage UC カタログは Cross-cloud Lakehouse Federation 非対応。
  • Metastore で external_access_enabled=true であること(databricks metastores summary で確認)。
  • CREATE TABLE を実行するユーザー/クラスタにスキーマへの CREATE TABLE 権限があること。さらに、カタログに紐付くストレージクレデンシャルが SQL warehouse のランタイム IAM から assume 可能であること(落とし穴 #8 参照)。

Step 2 — UC Iceberg REST endpoint 経由で公開されているか確認

TOKEN=$(databricks auth token --profile my-awsome-workspace \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['access_token'])")

curl -s "https://my-awsome-workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest/v1/catalogs/wanyu_bq_test2/namespaces/test/tables/t_inventory_uniform" \
  -H "Authorization: Bearer $TOKEN" \
  | python3 -m json.tool | head -30

期待値:metadata-locationmetadata.format-version: 2、フルスキーマ、Iceberg 型(long, string, int, timestamptz)を含む JSON。これで UniForm-Delta が IRC 経由で Iceberg v2 として公開されていることを確認。

Step 3 — Databricks Service Principal 作成と権限付与

# SCIM API で SP 作成
databricks api post --profile my-awsome-workspace \
  /api/2.0/preview/scim/v2/ServicePrincipals \
  --json '{"displayName": "bq-uniform-federation-sp", "active": true}'

# レスポンスから applicationId(= OAuth client_id)を控える

# カタログレベル権限を EXTERNAL_USE_SCHEMA 含めて付与
databricks grants update --profile my-awsome-workspace \
  catalog wanyu_bq_test2 \
  --json '{
    "changes": [{
      "principal": "<sp-application-id>",
      "add": ["USE_CATALOG", "USE_SCHEMA", "SELECT", "EXTERNAL_USE_SCHEMA"]
    }]
  }'

EXTERNAL_USE_SCHEMA は IRC クライアントがテーブルメタデータを取得することを許可する権限です。これが無いと、SP は namespace のリストはできても loadTable で失敗します。

Step 4 — OAuth client secret 生成(UI 操作)

ワークスペーストークンからプログラム的に実行できない唯一のステップです。Databricks Account Console (accounts.cloud.databricks.com) で:

  1. User management → Service principals → bq-uniform-federation-sp
  2. Secrets タブ → Generate secret
  3. シークレットを保存(一度しか表示されません)

OAuth フローが動作することを確認:

CLIENT_ID="<sp-application-id>"
CLIENT_SECRET="<from-account-console>"
HOST="my-awsome-workspace.cloud.databricks.com"

curl -s -u "$CLIENT_ID:$CLIENT_SECRET" \
  -X POST "https://$HOST/oidc/v1/token" \
  -d "grant_type=client_credentials&scope=all-apis"
# access_token, expires_in:3600, scope:all-apis を含む JSON が返るはず

Step 5 — Secret Manager に regional + JSON で保存

ここで非自明な落とし穴が2つ重なります:シークレットは regional でなければならず、値は client_idclient_secret 両方を含む JSON 文書である必要があります(裸のシークレット文字列ではない)。

QP="my-awsome-bq-project"
LOCATION="us-east4"
SECRET_NAME="bq-uniform-test2-dbx-oauth"

# Regional Secret Manager API endpoint を設定(必須 — gcloud の標準 secret コマンドは
# regional secret 用の --location を期待通りに受け付けない)
gcloud config set api_endpoint_overrides/secretmanager \
  https://secretmanager.${LOCATION}.rep.googleapis.com/

# シークレットリソース作成
gcloud secrets create $SECRET_NAME --project=$QP --location=$LOCATION

# JSON ペイロードでバージョン追加(裸のシークレット文字列ではない)
echo "{\"client_id\":\"${CLIENT_ID}\",\"client_secret\":\"${CLIENT_SECRET}\"}" \
  | gcloud secrets versions add $SECRET_NAME \
      --project=$QP --location=$LOCATION --data-file=-

# 終わったら endpoint をリセット
gcloud config unset api_endpoint_overrides/secretmanager

裸の client secret だけを保存(ドキュメントの最も自然な解釈)すると、federation refresh が "Failed to fetch credentials for Unity Catalog refresh" で失敗します — 一見 IAM 問題に見える誤解を招くエラー。

Step 6 — BigLake federated catalog 作成

gcloud 566.0.0 以上が必要(2026-04-24 リリース)。それ以前は --catalog-type=federatedInvalid choice: 'federated' で静かに拒否されます。

gcloud alpha biglake iceberg catalogs create wanyu_bq_test2_fed \
  --project=$QP \
  --primary-location=$LOCATION \
  --catalog-type=federated \
  --federated-catalog-type=unity \
  --secret-name=projects/$QP/locations/$LOCATION/secrets/$SECRET_NAME \
  --unity-instance-name=my-awsome-workspace.cloud.databricks.com \
  --unity-catalog-name=wanyu_bq_test2 \
  --refresh-interval=300s

これで federation がシークレットアクセスに使う BigLake 管理 Service Account が返されます。次のステップで使うので控える。

BIGLAKE_SA=$(gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
  --project=$QP \
  --format='value(biglake-service-account)')
echo "BigLake SA: $BIGLAKE_SA"
# 例: blirc-83839625123-7dh1@gcp-sa-biglakerestcatalog.iam.gserviceaccount.com

Step 7 — BigLake SA に regional secret アクセスを付与

gcloud config set api_endpoint_overrides/secretmanager \
  https://secretmanager.${LOCATION}.rep.googleapis.com/

gcloud secrets add-iam-policy-binding $SECRET_NAME \
  --project=$QP \
  --location=$LOCATION \
  --member="serviceAccount:$BIGLAKE_SA" \
  --role=roles/secretmanager.secretAccessor

gcloud config unset api_endpoint_overrides/secretmanager

Step 8 — 初回 refresh を待ち、BigQuery から query

初回 refresh は refresh interval に発火します(カタログ作成時ではなく、5分後)。完了を確認:

gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
  --project=$QP --format=json \
  | python3 -c "
import json, sys
d=json.load(sys.stdin)
rs=d.get('federated-catalog-options',{}).get('refresh-status',{})
print('end:', rs.get('end-time'))
print('code:', rs.get('status',{}).get('code', '0=ok'))
print('msg:', rs.get('status',{}).get('message', '(empty=success)'))
"

code=0(または空)が成功です。

それから BigQuery から 4-part path project.federated_catalog.namespace.table で query:

bq --project_id=$QP --location=$LOCATION query --use_legacy_sql=false \
  'SELECT inventory_id, sku, region, qty, classification, updated_at
   FROM `my-awsome-bq-project.wanyu_bq_test2_fed.test.t_inventory_uniform`
   ORDER BY inventory_id'

BigQuery Web UI の場合:SQL editor を開き、query location を us-east4 に設定(Settings → Query settings → Data location)、同じ query をペースト。Federated catalog は navigator パネルに表示されません — これは仕様で、フルパスで参照する必要があります。

テスト結果:

+---+---------+--------+-----+-------------+-------------------+
| 1 | SKU-001 | JP     | 100 | public      | 2026-04-30 13:27  |
| 2 | SKU-002 | KR     |  50 | public      | 2026-04-30 13:27  |
| 3 | SKU-003 | TW     |  75 | restricted  | 2026-04-30 13:27  |
| 4 | SKU-004 | HK     |  30 | public      | 2026-04-30 13:27  |
| 5 | SKU-005 | IN     | 200 | public      | 2026-04-30 13:27  |
+---+---------+--------+-----+-------------+-------------------+

エンドツーエンド確認:BQ Cross-cloud Lakehouse FederationUC Iceberg REST endpointUniForm 有効化 Managed Delta テーブル

クリーンアップ(再実行時/テストリソース削除時)

# 1. Federated catalog 削除
gcloud alpha biglake iceberg catalogs delete wanyu_bq_test2_fed \
  --project=$QP --quiet

# 2. Regional secret 削除(regional endpoint override 必須)
gcloud config set api_endpoint_overrides/secretmanager \
  https://secretmanager.${LOCATION}.rep.googleapis.com/

gcloud secrets delete bq-uniform-test2-dbx-oauth \
  --project=$QP --location=$LOCATION --quiet

gcloud config unset api_endpoint_overrides/secretmanager

# 3. (任意)Databricks 側リソース削除
# DROP TABLE wanyu_bq_test2.test.t_inventory_uniform;
# DROP SCHEMA wanyu_bq_test2.test CASCADE;
# Account console で SP を削除(不要であれば)

落とし穴(優先順)

  1. gcloud をまずアップデート して 566.0.0 以上に(2026-04-24 リリース)。--catalog-type=federated フラグはそれ以前のバージョンには存在しません。
  2. シークレット値は JSON、裸のシークレット文字列ではない。{"client_id":"…","client_secret":"…"} を使う。
  3. シークレットは regional でなければならない。Regional API endpoint override を使う。標準の gcloud secrets create --location は期待通りに動かない。
  4. EXTERNAL_USE_SCHEMA を SP に付与SELECT だけでは不十分。これが無いと federation がテーブルを 0 個と認識する。
  5. UniForm と delta.enableDeletionVectors=true は非互換。 UniForm 有効化前に DV を無効化(と既存 DV ファイルクリーンアップのため REORG TABLE … APPLY (PURGE))が必要。
  6. 初回 refresh は 5 分以上かかる。 "Force refresh" コマンドは無く、delete + recreate のみ。遅延を計画する。
  7. BigQuery UI navigator は federated catalog を表示しない。 4-part path で参照。Query location をカタログのリージョンに合わせる。
  8. ストレージクレデンシャル解決がランタイムで失敗すること(credentialName = None)が、grants が正しく見えても起こり得る。カタログのストレージクレデンシャルは、SQL warehouse のランタイム IAM が assume できるものでなければならない。設計前に確認する。

参考 URL

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?