GAされたDatabricks Workspace Filesを試してみる

Databricks

Last updated at 2023-04-25Posted at 2023-04-25

こちらの機能を試してみます。

Pythonモジュールの作成

これまではRepos配下でしか.pyファイルを作成できませんでしたが、ワークスペースのどこでもファイルを作成できます。これだけでも相当便利。

mymodule.py

def funcname():
       print(__name__)

ノートブックを作成して、モジュールをインポートします。

notebook

import mymodule
mymodule.funcname()

mymodule

カスタムPythonライブラリの開発が捗ります。

データの格納

DBFSを気にすることなしに、ノートブックと同じ場所に手軽にファイルをアップロードできます。

Python

import pandas as pd

df = pd.read_csv("/Workspace/Users/takaaki.yayoi@databricks.com/20230425_workspace_files/japan_cases_20220818.csv")
display(df)

initスクリプトの作成

注意
本機能がリリースされるまでに若干のタイムラグがあります。

これまでは、クラスターのinitスクリプトを利用するにはDBFSに.shを書き出してから設定する必要がありましたが、もっとお手軽に作成、設定ができるようになりました。

こちらを参考に。

ワークスペースで直接initスクリプトを記述します。

#!/bin/bash
apt-get --yes install libsndfile1

クラスターのinitスクリプトの送信先でWorkspaceを選択し、initスクリプトのパスを入力して追加します。この際、パスは/Users/以降を記述してください。

クラスターを起動するとinitスクリプトが実行されます。

%pip install librosa

Python

# Beat tracking example
import librosa

# 1. Get the file path to an included audio example
filename = librosa.example('nutcracker')

# 2. Load the audio as a waveform `y`
#    Store the sampling rate as `sr`
y, sr = librosa.load(filename)

# 3. Run the default beat tracker
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

print('Estimated tempo: {:.2f} beats per minute'.format(tempo))

# 4. Convert the frame indices of beat events into timestamps
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

Downloading file 'Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' from 'https://librosa.org/data/audio/Kevin_MacLeod_-_P_I_Tchaikovsky_Dance_of_the_Sugar_Plum_Fairy.ogg' to '/root/.cache/librosa'.
Estimated tempo: 107.67 beats per minute

動きました！

Databricksクイックスタートガイド

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up