0

More than 1 year has passed since last update.

@ryoma-nagata(Ryoma Nagata)

Openhack resources

Last updated at 2023-04-12Posted at 2023-04-12

Openhack resources

備忘もかねて
Openhack for Lakehouse で参考となる情報をまとめます。

ソースコード

clone用：https://github.com/microsoft/openhack-for-lakehouse-japanese.git
dbc用：https://github.com/microsoft/openhack-for-lakehouse-japanese/releases/tag/v1.1.1
旧バージョン（3日間版＋tip増の大容量版）：https://github.com/microsoft/openhack-for-lakehouse-japanese/releases/tag/v1.0.0

共通

Databricks 無償版（コミュニティプラン）

学習

databricks 社の教育サイト：https://learn.microsoft.com/ja-jp/training/modules/use-apache-spark-azure-databricks/
msdocs: https://learn.microsoft.com/ja-jp/training/modules/use-apache-spark-azure-databricks/
チートシート：https://pages.databricks.com/rs/094-YMS-629/images/Delta-Lake-cheat-sheet.pdf

Community

Day1

git連携

エディタ：

ファイル入出力、連携

サンプル

delta lake : https://docs.delta.io/latest/quick-start.html
delta lake tutorial https://learn.microsoft.com/ja-jp/azure/databricks/delta/tutorial
datasets https://qiita.com/ryoma-nagata/items/5f34c8f40cbced373ab0

vnet アーキテクチャ

autoLoader COpyinto

avalible now: https://spark.apache.org/docs/latest/api/python/reference/pyspark.ss/api/pyspark.sql.streaming.DataStreamWriter.trigger.html
copyinto と autoloader https://qiita.com/ryoma-nagata/items/74e1bd9ebaf0413c9fd6

python api

python api https://docs.delta.io/latest/api/python/index.html
delta lake oss https://docs.delta.io/latest/delta-apidoc.html

クラスター、ジョブ、プール

SQLWH

other

Day2

ML runtime

ビルトインライブラリ https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/12.2ml

mlflow

feature store

sample https://learn.microsoft.com/en-us/azure/databricks/machine-learning/feature-store/example-notebooks

pandas on spark

pandas on spark https://learn.microsoft.com/en-us/azure/databricks/pandas/pandas-on-spark

openhack再現の仕方

前提

データセットのダウンロード
- https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce?select=olist_customers_dataset.csv
datarbricks環境の作成

管理者設定でDBFSファイルブラウザを有効にする

データエクスプローラからDBFSを閲覧

右クリックなどでフォルダを作成し、「/FileStore/db_hackathon4lakehouse_2022/datasource」フォルダにデータセットをすべてアップする

0

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

0