More than 1 year has passed since last update.

Azure Databricks でストレージをマウントする

Posted at 2022-06-10

はじめに

Azure Databricks でストレージをマウントするには以下 2 つの方法が存在します。

ストレージアカウントのアクセスキーを使用する方法
Azure Active Directory 資格情報パススルーを使用する方法

今回は両方の実装方法を説明します。

ストレージアカウントのアクセスキーを使用したマウント

1. ストレージアカウントの用意

Azure Data Lake Storage Gen2 (ADLS Gen2) を作成し、確認用のファイルを格納しておきます。

アクセスキーを確認しておきます。

2. マウント処理

Databricks コンソールにて、以下のコードを実行します。

storage_account = "<ストレージ アカウント>"
container = "<コンテナ>"
access_key = "<アクセス キー>"

dbutils.fs.mount(
  source = "wasbs://"+container+"@"+storage_account+".blob.core.windows.net",
  mount_point = "/mnt/data",
  extra_configs = {"fs.azure.account.key."+storage_account+".blob.core.windows.net": access_key}
)

display(dbutils.fs.ls("/mnt/data"))

ディレクトリ内容が表示されることを確認します。

Azure Active Directory 資格情報パススルーを使用したマウント

1. ストレージアカウントの用意

先程と同様、Azure Data Lake Storage Gen2 (ADLS Gen2) を作成し、確認用のファイルを格納しておきます。

Databricks で処理を行いたいユーザに [ストレージ BLOB データ所有者] の権限を与えておきます。

2. Databricks クラスターの設定

Databricks コンソールのクラスター作成画面にて、[Enable credential passthrough for user-level data access] を有効化します。
[Cluster mode] が [Standard] のため、使用できるのは 1 人のユーザに限られます。
必要に応じて [Hight Concurrency] を選択しましょう。

3. マウント処理

Databricks コンソールにて、以下のコードを実行します。

storage_account = "<ストレージ　アカウント>"
container = "<コンテナ>"
configs = {
  "fs.azure.account.auth.type": "CustomAccessToken",
  "fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

dbutils.fs.mount(
  source = "abfss://"+container+"@"+storage_account+".dfs.core.windows.net/",
  mount_point = "/mnt/data",
  extra_configs = configs
)

display(dbutils.fs.ls("/mnt/data"))

ディレクトリ内容が表示されることを確認します。

おわりに

Azure Databricks からストレージをマウントする方法について 2 種類説明しました。
ストレージアカウントのアクセスキーは、シークレットのスコープにシークレットから取得すると安全性が増します。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Azure Databricks でストレージをマウントする

はじめに

ストレージ アカウントのアクセス キーを使用したマウント

1. ストレージ アカウントの用意

2. マウント処理

Azure Active Directory 資格情報パススルーを使用したマウント

1. ストレージ アカウントの用意

2. Databricks クラスターの設定

3. マウント処理

おわりに

参考

ストレージアカウントのアクセスキーを使用したマウント

1. ストレージアカウントの用意

1. ストレージアカウントの用意