Reading Databricks Tables from BigQuery via UniForm and Cross-cloud Lakehouse Federation / UniFormとGoogle Cloud Lakehouseフェデレーションを介してBigQueryからDatabricksテーブルを読み込む
Verified end-to-end on 2026-04-30. Hands-on test summary; English first, Japanese below.
2026年4月30日にエンドツーエンドで検証済み。実地テストの概要(英語が先、日本語は後述)。
English
Why — two paths to "BigQuery reads Databricks tables"
For customers running primary data engineering on Databricks but with BigQuery still in the consumer/serving path (a common state during BQ-to-Databricks migrations), there are two architectural approaches to let BigQuery query Databricks-managed tables directly. They look similar at first glance — both produce "BQ can SELECT from a Databricks table" — but they differ meaningfully in how they scale.
Path (A): BigLake external table on Delta path
- BigQuery side creates a BigLake external table pointing at the Delta directory in cloud storage
- BigLake natively understands the Delta transaction log and serves the current snapshot
- Setup is per-table — each table gets its own BigLake external table, refreshed via scripts
- GA, well-documented, used in production at many sites
- Brittle along two axes: scales O(N) with table count (each table needs its own pipeline), and constrained to classic Delta checkpoints (BigLake doesn't support V2 checkpoint, which newer Delta features increasingly require)
Path (C): BigQuery Cross-cloud Lakehouse Federation → Unity Catalog
- BigQuery side mounts the entire Databricks Unity Catalog as a foreign catalog via the Iceberg REST Catalog (IRC) protocol
- BigLake polls UC's IRC endpoint on a schedule and exposes everything it sees as queryable from BigQuery
- One-time setup at the catalog level — adding tables on the Databricks side automatically becomes queryable from BigQuery on the next refresh cycle
- Public Preview, requires
gcloud566.0.0+ (released 2026-04-24) - Resilient to Delta protocol evolution: as long as UniForm's Iceberg metadata generation continues working, BigLake doesn't care which Delta features you enable
| Dimension | (A) BigLake on Delta | (C) Cross-cloud Lakehouse Federation |
|---|---|---|
| Direction | BQ reads DBX | BQ reads DBX |
| Setup granularity | Per-table | Per-catalog |
| Discovery | Manual / scripted | Automatic (refresh interval ≥ 5 min) |
| Delta-side write capability | Read-only from BQ | Read-only from BQ |
| Underlying protocol | BigLake's native Delta reader | Iceberg REST (IRC) |
| Maturity | GA | Public Preview |
| Resilience to Delta protocol changes | Limited (V2 checkpoint not supported) | High (UniForm absorbs it) |
| BigQuery UI navigator support | Yes | No (4-part path only) |
The choice between (A) and (C) depends on the customer's specific constraints — table count, tolerance for Public Preview status, importance of newer Delta features that require V2 checkpoint, operational ownership of the BQ-side discovery scripts, etc. Both work today; the comparison above lays out the trade-offs without prescribing one.
What — Managed UniForm Delta tables (for Path C)
The remainder of this article focuses on Path (C). Path (A) is well-documented in the existing BigLake-on-Delta materials; we don't repeat it here.
Path (C) requires the Databricks-side tables to be Iceberg-readable. Three options exist:
- Native Managed Iceberg — UC creates Iceberg directly. Loses CDF, Z-order, generated columns, constraints, predictive optimization. Public Preview only.
-
External Delta + UniForm — Delta table created with explicit
LOCATION, with UniForm enabled. Works. - Managed Delta + UniForm — UC-managed Delta table, with UniForm enabled. Storage layout managed by UC. Recommended.
The third is the cleanest path: you keep all Delta features (CDF, Z-order, predictive optimization, generated columns, constraints), get UC to handle storage lifecycle, and gain the BigQuery interop on top via a single TBLPROPERTIES change.
Critical compatibility caveat: UniForm/IcebergCompatV2 is mutually exclusive with delta.enableDeletionVectors=true. If your Delta tables use deletion vectors today, you must either disable DV (and run REORG TABLE … APPLY (PURGE) to clean up) or maintain a UniForm-disabled copy for tables that need DV. Deletion vectors are a real Delta optimization for MERGE/UPDATE/DELETE-heavy workloads; weigh the trade-off honestly.
Other prerequisites on the Delta-side:
-
delta.minReaderVersion ≥ 2,delta.minWriterVersion ≥ 7(Databricks may auto-setminReaderVersion=2even if you request3) delta.columnMapping.mode='name'delta.enableIcebergCompatV2='true'delta.universalFormat.enabledFormats='iceberg'
How — verified end-to-end setup for Path C
Setup steps below are specifically for Path (C). For Path (A), refer to the BigLake-on-Delta documentation referenced at the end.
Below is the actual sequence that worked on 2026-04-30. Test environment: Databricks workspace my-awsome-workspace (AWS, ap-northeast-1), GCP project my-awsome-bq-project, federated catalog created in us-east4.
Step 1 — Create the UniForm-enabled Delta table on Databricks
-- Create schema if it doesn't exist (skip if already provisioned)
CREATE SCHEMA IF NOT EXISTS wanyu_bq_test2.test;
USE CATALOG wanyu_bq_test2;
USE SCHEMA test;
CREATE OR REPLACE TABLE t_inventory_uniform (
inventory_id BIGINT,
sku STRING,
region STRING,
qty INT,
classification STRING,
updated_at TIMESTAMP
)
USING DELTA
TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '7',
'delta.enableDeletionVectors' = 'false',
'delta.checkpointPolicy' = 'classic',
'delta.checkpoint.writeStatsAsStruct' = 'true',
'delta.enableChangeDataFeed' = 'true',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);
INSERT INTO t_inventory_uniform VALUES
(1, 'SKU-001', 'JP', 100, 'public', current_timestamp()),
(2, 'SKU-002', 'KR', 50, 'public', current_timestamp()),
(3, 'SKU-003', 'TW', 75, 'restricted', current_timestamp()),
(4, 'SKU-004', 'HK', 30, 'public', current_timestamp()),
(5, 'SKU-005', 'IN', 200, 'public', current_timestamp());
Prerequisites:
- The catalog must use external location storage (verify with
databricks catalogs get <name>—storage_rootshould be as3:///gs:///abfss://path). Default-storage UC catalogs are not supported by Cross-cloud Lakehouse Federation. - The metastore must have
external_access_enabled=true(verify withdatabricks metastores summary). - The user/cluster running this CREATE must have CREATE TABLE privilege on the schema and the storage credential bound to the catalog must be assumable by the SQL warehouse's runtime IAM (see Gotcha #8).
Step 2 — Verify the table is exposed via UC's Iceberg REST endpoint
TOKEN=$(databricks auth token --profile my-awsome-workspace \
| python3 -c "import json,sys; print(json.load(sys.stdin)['access_token'])")
curl -s "https://my-awsome-workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest/v1/catalogs/wanyu_bq_test2/namespaces/test/tables/t_inventory_uniform" \
-H "Authorization: Bearer $TOKEN" \
| python3 -m json.tool | head -30
Expected response: a JSON document with metadata-location, metadata.format-version: 2, full schema, and Iceberg-style fields (long, string, int, timestamptz). This confirms the IRC endpoint is serving the UniForm-Delta table as Iceberg v2.
Step 3 — Create a Databricks Service Principal and grant it access
# Create SP via SCIM API
databricks api post --profile my-awsome-workspace \
/api/2.0/preview/scim/v2/ServicePrincipals \
--json '{"displayName": "bq-uniform-federation-sp", "active": true}'
# Note the applicationId (= OAuth client_id) from the response
# Grant catalog-level privileges including EXTERNAL_USE_SCHEMA
databricks grants update --profile my-awsome-workspace \
catalog wanyu_bq_test2 \
--json '{
"changes": [{
"principal": "<sp-application-id>",
"add": ["USE_CATALOG", "USE_SCHEMA", "SELECT", "EXTERNAL_USE_SCHEMA"]
}]
}'
EXTERNAL_USE_SCHEMA is the privilege that authorizes IRC clients to retrieve table metadata. Without it, the SP can list namespaces but loadTable returns an error.
Step 4 — Generate the OAuth client secret (UI step)
This is the only step that cannot be done programmatically from a workspace token. In the Databricks Account Console (accounts.cloud.databricks.com):
- User management → Service principals →
bq-uniform-federation-sp - Secrets tab → Generate secret
- Save the secret (shown once)
Verify the OAuth flow works:
CLIENT_ID="<sp-application-id>"
CLIENT_SECRET="<from-account-console>"
HOST="my-awsome-workspace.cloud.databricks.com"
curl -s -u "$CLIENT_ID:$CLIENT_SECRET" \
-X POST "https://$HOST/oidc/v1/token" \
-d "grant_type=client_credentials&scope=all-apis"
# Should return JSON with access_token, expires_in:3600, scope:all-apis
Step 5 — Store the secret in regional Secret Manager as JSON
This is where two non-obvious gotchas combine: the secret must be regional (not global), and the value must be a JSON document containing both client_id and client_secret (not just the secret string alone).
QP="my-awsome-bq-project"
LOCATION="us-east4"
SECRET_NAME="bq-uniform-test2-dbx-oauth"
# Set the regional Secret Manager API endpoint (required — gcloud's standard
# secret commands don't accept --location for regional secrets)
gcloud config set api_endpoint_overrides/secretmanager \
https://secretmanager.${LOCATION}.rep.googleapis.com/
# Create the secret resource
gcloud secrets create $SECRET_NAME --project=$QP --location=$LOCATION
# Add a version with JSON payload (NOT just the bare secret string)
echo "{\"client_id\":\"${CLIENT_ID}\",\"client_secret\":\"${CLIENT_SECRET}\"}" \
| gcloud secrets versions add $SECRET_NAME \
--project=$QP --location=$LOCATION --data-file=-
# Reset endpoint when done
gcloud config unset api_endpoint_overrides/secretmanager
Storing only the bare client secret (the most natural interpretation of the docs) causes the federation refresh to fail with "Failed to fetch credentials for Unity Catalog refresh" — a misleading error that initially looks like an IAM problem.
Step 6 — Create the BigLake federated catalog
Requires gcloud 566.0.0 or later (released 2026-04-24). Older versions silently reject --catalog-type=federated with Invalid choice: 'federated'.
gcloud alpha biglake iceberg catalogs create wanyu_bq_test2_fed \
--project=$QP \
--primary-location=$LOCATION \
--catalog-type=federated \
--federated-catalog-type=unity \
--secret-name=projects/$QP/locations/$LOCATION/secrets/$SECRET_NAME \
--unity-instance-name=my-awsome-workspace.cloud.databricks.com \
--unity-catalog-name=wanyu_bq_test2 \
--refresh-interval=300s
This returns the BigLake-managed service account that the federation will use to access the secret. Note it for the next step.
BIGLAKE_SA=$(gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
--project=$QP \
--format='value(biglake-service-account)')
echo "BigLake SA: $BIGLAKE_SA"
# Example: blirc-83839625123-7dh1@gcp-sa-biglakerestcatalog.iam.gserviceaccount.com
Step 7 — Grant the BigLake SA access to the regional secret
gcloud config set api_endpoint_overrides/secretmanager \
https://secretmanager.${LOCATION}.rep.googleapis.com/
gcloud secrets add-iam-policy-binding $SECRET_NAME \
--project=$QP \
--location=$LOCATION \
--member="serviceAccount:$BIGLAKE_SA" \
--role=roles/secretmanager.secretAccessor
gcloud config unset api_endpoint_overrides/secretmanager
Step 8 — Wait for first refresh and query from BigQuery
The first refresh fires after the refresh interval (so 5 min after catalog creation, not at create time). Watch for completion:
gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
--project=$QP --format=json \
| python3 -c "
import json, sys
d=json.load(sys.stdin)
rs=d.get('federated-catalog-options',{}).get('refresh-status',{})
print('end:', rs.get('end-time'))
print('code:', rs.get('status',{}).get('code', '0=ok'))
print('msg:', rs.get('status',{}).get('message', '(empty=success)'))
"
code=0 (or empty) indicates success.
Then query from BigQuery using the 4-part path project.federated_catalog.namespace.table:
bq --project_id=$QP --location=$LOCATION query --use_legacy_sql=false \
'SELECT inventory_id, sku, region, qty, classification, updated_at
FROM `my-awsome-bq-project.wanyu_bq_test2_fed.test.t_inventory_uniform`
ORDER BY inventory_id'
In the BigQuery Web UI: open the SQL editor, set query location to us-east4 (Settings → Query settings → Data location), then paste the same query. The federated catalog will not appear in the navigator panel — this is a documented limitation; you must reference it by full path.
Result for the test:
+---+---------+--------+-----+-------------+-------------------+
| 1 | SKU-001 | JP | 100 | public | 2026-04-30 13:27 |
| 2 | SKU-002 | KR | 50 | public | 2026-04-30 13:27 |
| 3 | SKU-003 | TW | 75 | restricted | 2026-04-30 13:27 |
| 4 | SKU-004 | HK | 30 | public | 2026-04-30 13:27 |
| 5 | SKU-005 | IN | 200 | public | 2026-04-30 13:27 |
+---+---------+--------+-----+-------------+-------------------+
End-to-end confirmed: BQ Cross-cloud Lakehouse Federation → UC Iceberg REST endpoint → UniForm-enabled Managed Delta table.
Cleanup (when re-running or removing the test)
# 1. Delete the federated catalog
gcloud alpha biglake iceberg catalogs delete wanyu_bq_test2_fed \
--project=$QP --quiet
# 2. Delete the regional secret (use regional endpoint override)
gcloud config set api_endpoint_overrides/secretmanager \
https://secretmanager.${LOCATION}.rep.googleapis.com/
gcloud secrets delete bq-uniform-test2-dbx-oauth \
--project=$QP --location=$LOCATION --quiet
gcloud config unset api_endpoint_overrides/secretmanager
# 3. (Optional) Drop the Databricks-side resources
# DROP TABLE wanyu_bq_test2.test.t_inventory_uniform;
# DROP SCHEMA wanyu_bq_test2.test CASCADE;
# Account console: delete the SP if no longer needed
Gotchas, in priority order
-
Update gcloud first to 566.0.0+ (released 2026-04-24). The
--catalog-type=federatedflag does not exist in older versions. -
Secret value is JSON, not the bare secret string. Use
{"client_id":"…","client_secret":"…"}. -
Secret must be regional. Use the regional API endpoint override; standard
gcloud secrets create --locationdoes not work the way you'd expect. -
Grant
EXTERNAL_USE_SCHEMAto the SP at the catalog level, not justSELECT. Without it, federation sees zero tables. -
UniForm +
delta.enableDeletionVectors=trueare incompatible. Disable DV (and runREORG TABLE … APPLY (PURGE)to clean up existing DV files) before enabling UniForm. - First refresh takes 5+ minutes. No "force refresh" command exists — only delete + recreate. Plan for the lag.
- BigQuery UI navigator does not show federated catalogs. Reference by 4-part path. Set query location to the catalog's region.
-
Storage credential resolution at runtime can fail (
credentialName = None) even when grants look correct. The catalog's storage credential must be one your SQL warehouses' runtime IAM can assume. Verify before designing for it.
Reference URLs
- Databricks UC Iceberg REST endpoint
- Databricks UniForm
- Google Cloud — Set up Cross-cloud Lakehouse
- Google Cloud — Use Cross-cloud Lakehouse
日本語
Why(なぜ)— BigQuery から Databricks テーブルを参照する2つの方法
データエンジニアリングを Databricks 中心で行いつつ、BigQuery を消費・サービング層に残しているお客様(BQ→Databricks 移行中によくある状態)にとって、BigQuery から Databricks 管理テーブルを直接参照する方法は2つあります。表面的には同じ「BQ から SELECT できる」ですが、スケール特性が大きく異なります。
(A) BigLake 外部テーブル on Delta パス
- BQ 側で BigLake 外部テーブルを作成し、Delta ディレクトリを指定
- BigLake は Delta トランザクションログを直接解釈し、現スナップショットを返す
- セットアップはテーブル単位 — 1テーブルごとに BigLake 外部テーブルを作成し、スクリプトで再生成
- GA、ドキュメント整備済、本番運用実績あり
- 2軸で脆弱性あり:テーブル数 N に対して O(N) で運用コストが膨らむ/Classic checkpoint 限定(V2 checkpoint は BigLake 非対応で、新しい Delta 機能ほど V2 を要求するようになっている)
(C) BigQuery Cross-cloud Lakehouse Federation → Unity Catalog
- BQ 側が Unity Catalog 全体を foreign catalog としてマウント、Iceberg REST Catalog (IRC) プロトコルで通信
- BigLake が UC の IRC エンドポイントを定期ポーリングし、見つかったテーブルを BQ から参照可能にする
- セットアップはカタログ単位で1回 — Databricks 側でテーブルを追加すれば、次の refresh サイクルで自動的に BQ から見えるようになる
- Public Preview、
gcloud566.0.0+ が必要(2026-04-24 リリース) - Delta プロトコル進化に強い:UniForm の Iceberg メタデータ生成が動いていれば、BigLake 側はどの Delta 機能を使っても透過
| 観点 | (A) BigLake on Delta | (C) Cross-cloud Lakehouse Federation |
|---|---|---|
| 方向 | BQ → DBX 参照 | BQ → DBX 参照 |
| セットアップ粒度 | テーブル単位 | カタログ単位 |
| 探索 | 手動/スクリプト | 自動(refresh 間隔 ≥ 5分) |
| Delta 側書込み(BQ から) | 不可 | 不可 |
| プロトコル | BigLake ネイティブ Delta reader | Iceberg REST (IRC) |
| 成熟度 | GA | Public Preview |
| Delta プロトコル変更耐性 | 限定的(V2 checkpoint 非対応) | 高(UniForm が吸収) |
| BigQuery UI ナビゲータ表示 | あり | なし(4-part path 指定のみ) |
(A) と (C) のどちらを選ぶかは、お客様固有の制約(テーブル数、Public Preview ステータスへの許容度、V2 checkpoint を要求する新しい Delta 機能の重要度、BQ 側 discovery スクリプトの運用負荷など)に依存します。両方とも今日動作するため、上記の比較は一方を推奨せず、トレードオフを並べた形にしています。
What(何を)— Managed UniForm Delta テーブル(Path C 用)
本記事の残りは (C) に焦点を当てます。(A) は既存の BigLake-on-Delta 資料で十分整備されているため、ここでは繰り返しません。
(C) を機能させるには、Databricks 側のテーブルが Iceberg 互換である必要があります。3つの選択肢:
- Native Managed Iceberg — UC が直接 Iceberg として作成。CDF / Z-order / 生成列 / 制約 / Predictive Optimization を失う。Public Preview のみ。
-
External Delta + UniForm —
LOCATION明示で作成した Delta テーブル + UniForm 有効化。動作する。 - Managed Delta + UniForm — UC 管理の Delta テーブル + UniForm 有効化。ストレージレイアウトは UC が管理。推奨。
3番目が最もクリーン:Delta 機能(CDF / Z-order / Predictive Optimization / 生成列 / 制約)を全て維持しつつ、UC にストレージライフサイクルを任せ、TBLPROPERTIES 1つで BigQuery 連携を獲得。
重要な互換性注意点: UniForm/IcebergCompatV2 は delta.enableDeletionVectors=true と排他的です。既存テーブルが deletion vectors を使っている場合、無効化(と REORG TABLE … APPLY (PURGE) での既存 DV ファイルクリーンアップ)が必要、または DV が必要なテーブル用に UniForm 無効版を別途維持する必要があります。Deletion vectors は MERGE/UPDATE/DELETE 多用ワークロードで実際の最適化機能なので、トレードオフは正直に評価する必要があります。
Delta 側のその他の前提条件:
-
delta.minReaderVersion ≥ 2,delta.minWriterVersion ≥ 7(3を指定しても Databricks が自動的に2に設定する場合あり) delta.columnMapping.mode='name'delta.enableIcebergCompatV2='true'delta.universalFormat.enabledFormats='iceberg'
How(どうやって)— Path C の検証済みエンドツーエンドセットアップ
以下のセットアップ手順は (C) 専用です。(A) については末尾参照の BigLake-on-Delta ドキュメントを参照してください。
以下は 2026-04-30 に実際に動作したコマンド列です。テスト環境:Databricks ワークスペース my-awsome-workspace(AWS, ap-northeast-1)、GCP プロジェクト my-awsome-bq-project、federated catalog は us-east4。
Step 1 — Databricks 側で UniForm 有効化 Delta テーブル作成
-- スキーマが無ければ作成(既にプロビジョン済みならスキップ)
CREATE SCHEMA IF NOT EXISTS wanyu_bq_test2.test;
USE CATALOG wanyu_bq_test2;
USE SCHEMA test;
CREATE OR REPLACE TABLE t_inventory_uniform (
inventory_id BIGINT,
sku STRING,
region STRING,
qty INT,
classification STRING,
updated_at TIMESTAMP
)
USING DELTA
TBLPROPERTIES (
'delta.columnMapping.mode' = 'name',
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '7',
'delta.enableDeletionVectors' = 'false',
'delta.checkpointPolicy' = 'classic',
'delta.checkpoint.writeStatsAsStruct' = 'true',
'delta.enableChangeDataFeed' = 'true',
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg'
);
INSERT INTO t_inventory_uniform VALUES
(1, 'SKU-001', 'JP', 100, 'public', current_timestamp()),
(2, 'SKU-002', 'KR', 50, 'public', current_timestamp()),
(3, 'SKU-003', 'TW', 75, 'restricted', current_timestamp()),
(4, 'SKU-004', 'HK', 30, 'public', current_timestamp()),
(5, 'SKU-005', 'IN', 200, 'public', current_timestamp());
前提条件:
- カタログが external location 構成であること(
databricks catalogs get <name>でstorage_rootがs3:///gs:///abfss://パスであることを確認)。Default-storage UC カタログは Cross-cloud Lakehouse Federation 非対応。 - Metastore で
external_access_enabled=trueであること(databricks metastores summaryで確認)。 - CREATE TABLE を実行するユーザー/クラスタにスキーマへの CREATE TABLE 権限があること。さらに、カタログに紐付くストレージクレデンシャルが SQL warehouse のランタイム IAM から assume 可能であること(落とし穴 #8 参照)。
Step 2 — UC Iceberg REST endpoint 経由で公開されているか確認
TOKEN=$(databricks auth token --profile my-awsome-workspace \
| python3 -c "import json,sys; print(json.load(sys.stdin)['access_token'])")
curl -s "https://my-awsome-workspace.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest/v1/catalogs/wanyu_bq_test2/namespaces/test/tables/t_inventory_uniform" \
-H "Authorization: Bearer $TOKEN" \
| python3 -m json.tool | head -30
期待値:metadata-location、metadata.format-version: 2、フルスキーマ、Iceberg 型(long, string, int, timestamptz)を含む JSON。これで UniForm-Delta が IRC 経由で Iceberg v2 として公開されていることを確認。
Step 3 — Databricks Service Principal 作成と権限付与
# SCIM API で SP 作成
databricks api post --profile my-awsome-workspace \
/api/2.0/preview/scim/v2/ServicePrincipals \
--json '{"displayName": "bq-uniform-federation-sp", "active": true}'
# レスポンスから applicationId(= OAuth client_id)を控える
# カタログレベル権限を EXTERNAL_USE_SCHEMA 含めて付与
databricks grants update --profile my-awsome-workspace \
catalog wanyu_bq_test2 \
--json '{
"changes": [{
"principal": "<sp-application-id>",
"add": ["USE_CATALOG", "USE_SCHEMA", "SELECT", "EXTERNAL_USE_SCHEMA"]
}]
}'
EXTERNAL_USE_SCHEMA は IRC クライアントがテーブルメタデータを取得することを許可する権限です。これが無いと、SP は namespace のリストはできても loadTable で失敗します。
Step 4 — OAuth client secret 生成(UI 操作)
ワークスペーストークンからプログラム的に実行できない唯一のステップです。Databricks Account Console (accounts.cloud.databricks.com) で:
- User management → Service principals →
bq-uniform-federation-sp - Secrets タブ → Generate secret
- シークレットを保存(一度しか表示されません)
OAuth フローが動作することを確認:
CLIENT_ID="<sp-application-id>"
CLIENT_SECRET="<from-account-console>"
HOST="my-awsome-workspace.cloud.databricks.com"
curl -s -u "$CLIENT_ID:$CLIENT_SECRET" \
-X POST "https://$HOST/oidc/v1/token" \
-d "grant_type=client_credentials&scope=all-apis"
# access_token, expires_in:3600, scope:all-apis を含む JSON が返るはず
Step 5 — Secret Manager に regional + JSON で保存
ここで非自明な落とし穴が2つ重なります:シークレットは regional でなければならず、値は client_id と client_secret 両方を含む JSON 文書である必要があります(裸のシークレット文字列ではない)。
QP="my-awsome-bq-project"
LOCATION="us-east4"
SECRET_NAME="bq-uniform-test2-dbx-oauth"
# Regional Secret Manager API endpoint を設定(必須 — gcloud の標準 secret コマンドは
# regional secret 用の --location を期待通りに受け付けない)
gcloud config set api_endpoint_overrides/secretmanager \
https://secretmanager.${LOCATION}.rep.googleapis.com/
# シークレットリソース作成
gcloud secrets create $SECRET_NAME --project=$QP --location=$LOCATION
# JSON ペイロードでバージョン追加(裸のシークレット文字列ではない)
echo "{\"client_id\":\"${CLIENT_ID}\",\"client_secret\":\"${CLIENT_SECRET}\"}" \
| gcloud secrets versions add $SECRET_NAME \
--project=$QP --location=$LOCATION --data-file=-
# 終わったら endpoint をリセット
gcloud config unset api_endpoint_overrides/secretmanager
裸の client secret だけを保存(ドキュメントの最も自然な解釈)すると、federation refresh が "Failed to fetch credentials for Unity Catalog refresh" で失敗します — 一見 IAM 問題に見える誤解を招くエラー。
Step 6 — BigLake federated catalog 作成
gcloud 566.0.0 以上が必要(2026-04-24 リリース)。それ以前は --catalog-type=federated を Invalid choice: 'federated' で静かに拒否されます。
gcloud alpha biglake iceberg catalogs create wanyu_bq_test2_fed \
--project=$QP \
--primary-location=$LOCATION \
--catalog-type=federated \
--federated-catalog-type=unity \
--secret-name=projects/$QP/locations/$LOCATION/secrets/$SECRET_NAME \
--unity-instance-name=my-awsome-workspace.cloud.databricks.com \
--unity-catalog-name=wanyu_bq_test2 \
--refresh-interval=300s
これで federation がシークレットアクセスに使う BigLake 管理 Service Account が返されます。次のステップで使うので控える。
BIGLAKE_SA=$(gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
--project=$QP \
--format='value(biglake-service-account)')
echo "BigLake SA: $BIGLAKE_SA"
# 例: blirc-83839625123-7dh1@gcp-sa-biglakerestcatalog.iam.gserviceaccount.com
Step 7 — BigLake SA に regional secret アクセスを付与
gcloud config set api_endpoint_overrides/secretmanager \
https://secretmanager.${LOCATION}.rep.googleapis.com/
gcloud secrets add-iam-policy-binding $SECRET_NAME \
--project=$QP \
--location=$LOCATION \
--member="serviceAccount:$BIGLAKE_SA" \
--role=roles/secretmanager.secretAccessor
gcloud config unset api_endpoint_overrides/secretmanager
Step 8 — 初回 refresh を待ち、BigQuery から query
初回 refresh は refresh interval 後に発火します(カタログ作成時ではなく、5分後)。完了を確認:
gcloud alpha biglake iceberg catalogs describe wanyu_bq_test2_fed \
--project=$QP --format=json \
| python3 -c "
import json, sys
d=json.load(sys.stdin)
rs=d.get('federated-catalog-options',{}).get('refresh-status',{})
print('end:', rs.get('end-time'))
print('code:', rs.get('status',{}).get('code', '0=ok'))
print('msg:', rs.get('status',{}).get('message', '(empty=success)'))
"
code=0(または空)が成功です。
それから BigQuery から 4-part path project.federated_catalog.namespace.table で query:
bq --project_id=$QP --location=$LOCATION query --use_legacy_sql=false \
'SELECT inventory_id, sku, region, qty, classification, updated_at
FROM `my-awsome-bq-project.wanyu_bq_test2_fed.test.t_inventory_uniform`
ORDER BY inventory_id'
BigQuery Web UI の場合:SQL editor を開き、query location を us-east4 に設定(Settings → Query settings → Data location)、同じ query をペースト。Federated catalog は navigator パネルに表示されません — これは仕様で、フルパスで参照する必要があります。
テスト結果:
+---+---------+--------+-----+-------------+-------------------+
| 1 | SKU-001 | JP | 100 | public | 2026-04-30 13:27 |
| 2 | SKU-002 | KR | 50 | public | 2026-04-30 13:27 |
| 3 | SKU-003 | TW | 75 | restricted | 2026-04-30 13:27 |
| 4 | SKU-004 | HK | 30 | public | 2026-04-30 13:27 |
| 5 | SKU-005 | IN | 200 | public | 2026-04-30 13:27 |
+---+---------+--------+-----+-------------+-------------------+
エンドツーエンド確認:BQ Cross-cloud Lakehouse Federation → UC Iceberg REST endpoint → UniForm 有効化 Managed Delta テーブル。
クリーンアップ(再実行時/テストリソース削除時)
# 1. Federated catalog 削除
gcloud alpha biglake iceberg catalogs delete wanyu_bq_test2_fed \
--project=$QP --quiet
# 2. Regional secret 削除(regional endpoint override 必須)
gcloud config set api_endpoint_overrides/secretmanager \
https://secretmanager.${LOCATION}.rep.googleapis.com/
gcloud secrets delete bq-uniform-test2-dbx-oauth \
--project=$QP --location=$LOCATION --quiet
gcloud config unset api_endpoint_overrides/secretmanager
# 3. (任意)Databricks 側リソース削除
# DROP TABLE wanyu_bq_test2.test.t_inventory_uniform;
# DROP SCHEMA wanyu_bq_test2.test CASCADE;
# Account console で SP を削除(不要であれば)
落とし穴(優先順)
-
gcloud をまずアップデート して 566.0.0 以上に(2026-04-24 リリース)。
--catalog-type=federatedフラグはそれ以前のバージョンには存在しません。 -
シークレット値は JSON、裸のシークレット文字列ではない。
{"client_id":"…","client_secret":"…"}を使う。 -
シークレットは regional でなければならない。Regional API endpoint override を使う。標準の
gcloud secrets create --locationは期待通りに動かない。 -
EXTERNAL_USE_SCHEMAを SP に付与、SELECTだけでは不十分。これが無いと federation がテーブルを 0 個と認識する。 -
UniForm と
delta.enableDeletionVectors=trueは非互換。 UniForm 有効化前に DV を無効化(と既存 DV ファイルクリーンアップのためREORG TABLE … APPLY (PURGE))が必要。 - 初回 refresh は 5 分以上かかる。 "Force refresh" コマンドは無く、delete + recreate のみ。遅延を計画する。
- BigQuery UI navigator は federated catalog を表示しない。 4-part path で参照。Query location をカタログのリージョンに合わせる。
-
ストレージクレデンシャル解決がランタイムで失敗すること(
credentialName = None)が、grants が正しく見えても起こり得る。カタログのストレージクレデンシャルは、SQL warehouse のランタイム IAM が assume できるものでなければならない。設計前に確認する。