More than 3 years have passed since last update.

@ayoyoin

株式会社インテック

Dataikuで散布図のweb app作成からプラグイン化まで

Posted at 2021-04-06

この記事で既存のサンプルwebappをプラグイン化するという話をしたのですが、web app作成から書いてある記事を公式サイトでみつけたので、web app作成～プラグイン化までをやってみました。

Dataiku version 8.0.2つかいました。

#bokeh散布図のweb app作成
参考：https://knowledge.dataiku.com/latest/courses/advanced-code/web-apps/bokeh.html

##前準備
まず、以下のプロジェクトをインポートします。
[+New Project] > [DSS Tutorials] > [General Topics] > [Haiku Starter]

この中のOrders_enriched_preparedデータセットが使いたいだけなので、ネットからダウンロードしてきても良いです。
https://downloads.dataiku.com/public/website-additional-assets/data/Orders_enriched_prepared.csv

##bokeh webappの作成
空のbokeh webappを作成します。
[>]>[+New Webapp]>[Code Webapp]>[Bokeh]>[An emptt Bokeh app]

Editタブでコーディングして、Viewタブで出力を見るようです。まだ何も設定してないので、Viewタブはからっぽ。

EditタブでもPreviewみながらできるようです。ここでLogとかもみながらコーディングする感じなのかな。

Pythonコードに以下を記載。ageとtotalで散布図を描くプログラムです。

# ライブラリインポート
from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput, Select
from bokeh.plotting import figure
import dataiku
import pandas as pd

# 入力パラメタの設定
input_dataset = "Orders_enriched_prepared"
x_column = "age"
y_column = "total"
time_column = "order_date_year"
cat_column = "tshirt_category"

#入力データをデータフレームに変換
mydataset = dataiku.Dataset(input_dataset)
df = mydataset.get_dataframe()

#x軸とy軸の値を取得し、可視化用にデータを整形
x = df[x_column]
y = df[y_column]
source = ColumnDataSource(data=dict(x=x, y=y))

# 描画の設定
plot = figure(plot_height=400, plot_width=400, title=y_column+" by "+x_column,
              tools="crosshair,pan,reset,save,wheel_zoom",
              x_range=[min(x), max(x)], y_range=[min(y),max(y)])

plot.scatter('x', 'y', source=source)

# レイアウトと出力の設定
inputs = widgetbox()

curdoc().add_root(row(inputs, plot, width=800))

[SAVE]クリックで、Preview画面に散布図が表示されます。

widgetsの設定

散布図をカスタマイズできるようにします。
さきほどのコードの「#レイアウトと出力の設定」の直前に、widgetsを定義したコードを追記します。

# widgetsの設定
text = TextInput(title="Title", value=y_column+" by "+x_column)
time = df[time_column]
min_year = Slider(title="Time start", value=min(time), start=min(time), end=max(time), step=1)
max_year = Slider(title="Time max", value=max(time), start=min(time), end=max(time), step=1)
cat_categories = df[cat_column].unique().tolist()
cat_categories.insert(0,'All')
category = Select(title="Category", value="All", options=cat_categories)

# レイアウトと出力の設定
inputs = widgetbox()

curdoc().add_root(row(inputs, plot, width=800))

ちなみに、以下のようなwidgetsを設定しています。

text:タイトルを設定
min_year,max_year:スライドバーで表示する
期間を設定
category:Tshirtカテゴリを選択

さらにupdateとCallbackの設定をします。
以下のコードを、先ほど追加したコードの後に追加します。

# widgetsの設定
text = TextInput(title="Title", value=y_column+" by "+x_column)
time = df[time_column]
min_year = Slider(title="Time start", value=min(time), start=min(time), end=max(time), step=1)
max_year = Slider(title="Time max", value=max(time), start=min(time), end=max(time), step=1)
cat_categories = df[cat_column].unique().tolist()
cat_categories.insert(0,'All')
category = Select(title="Category", value="All", options=cat_categories)

#updateとCallbackの設定
def update_title(attrname, old, new):
    #タイトルが変更されるとこの関数が呼び出され、plot.title.textを上書きする
    plot.title.text = text.value

# on_change methodで、タイトルの変更を感知
text.on_change('value', update_title)

def update_data(attrname, old, new):
    # スライドバーや選択widget(Tシャツカテゴリ)が変更されるとこの関数が呼び出される。
    # 選択した値に応じてdfをフィルタリングする。
    category_value = category.value
    selected = df[(time>=min_year.value) & (time<=max_year.value)]
    if (category_value != "All"):
        selected = selected[selected[cat_column].str.contains(category_value)==True]
    # Generate the new plot
    x = selected[x_column]
    y = selected[y_column]
    source.data = dict(x=x, y=y)
    
# on_change methodで、スライドバー等の変更を感知
for w in [min_year, max_year, category]:
    w.on_change('value', update_data)
    
# レイアウトと出力の設定
inputs = widgetbox()

curdoc().add_root(row(inputs, plot, width=800))

最後に、inputs=widgetbox()を以下のように置き換える。

# レイアウトと出力の設定
inputs = widgetbox(text, min_year, max_year, category)

curdoc().add_root(row(inputs, plot, width=800))

[save]クリックでPreviewにwidgetsの設定が反映される。

作成したwidgetsは、[ACTIONS]>[Publish]で、ダッシュボードに載せることも可能。

ダッシュボードのViewモードでは、widgetsをいじって散布図を変化させることができます。これは便利。

参考までに、ここまでのプログラム全体像を以下に記載。

# ライブラリインポート
from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput, Select
from bokeh.plotting import figure
import dataiku
import pandas as pd

# 入力パラメタの設定
input_dataset = "Orders_enriched_prepared"
x_column = "age"
y_column = "total"
time_column = "order_date_year"
cat_column = "tshirt_category"

#入力データをデータフレームに変換
mydataset = dataiku.Dataset(input_dataset)
df = mydataset.get_dataframe()

#x軸とy軸の値を取得し、可視化用にデータを整形
x = df[x_column]
y = df[y_column]
source = ColumnDataSource(data=dict(x=x, y=y))

# 描画の設定
plot = figure(plot_height=400, plot_width=400, title=y_column+" by "+x_column,
              tools="crosshair,pan,reset,save,wheel_zoom",
              x_range=[min(x), max(x)], y_range=[min(y),max(y)])

plot.scatter('x', 'y', source=source)

# widgetsの設定
text = TextInput(title="Title", value=y_column+" by "+x_column)
time = df[time_column]
min_year = Slider(title="Time start", value=min(time), start=min(time), end=max(time), step=1)
max_year = Slider(title="Time max", value=max(time), start=min(time), end=max(time), step=1)
cat_categories = df[cat_column].unique().tolist()
cat_categories.insert(0,'All')
category = Select(title="Category", value="All", options=cat_categories)

#updateとCallbackの設定
def update_title(attrname, old, new):
    #タイトルが変更されるとこの関数が呼び出され、plot.title.textを上書きする
    plot.title.text = text.value

# on_change methodで、タイトルの変更を感知
text.on_change('value', update_title)

def update_data(attrname, old, new):
    # スライドバーや選択widget(Tシャツカテゴリ)が変更されるとこの関数が呼び出される。
    # 選択した値に応じてdfをフィルタリングする。
    category_value = category.value
    selected = df[(time>=min_year.value) & (time<=max_year.value)]
    if (category_value != "All"):
        selected = selected[selected[cat_column].str.contains(category_value)==True]
    # Generate the new plot
    x = selected[x_column]
    y = selected[y_column]
    source.data = dict(x=x, y=y)
    
# on_change methodで、スライドバー等の変更を感知
for w in [min_year, max_year, category]:
    w.on_change('value', update_data)
    
# レイアウトと出力の設定
inputs = widgetbox(text, min_year, max_year, category)

curdoc().add_root(row(inputs, plot, width=800))

#plugin化
今まで作成した散布図web appをプラグイン化します。
参考：https://academy.dataiku.com/plugin-development/513345

以下の記事の前準備が終わっているものとします。
https://qiita.com/ayoyo/items/ffd2bfbf6a8c7f01ecd6

プロジェクトの以下にアクセスし、web app作成画面を表示します。
[>]>[+New Webapp]>[Code Webapp]>[sample-webapp]

[ACTIONS]>[Plugin]をクリック。

Existing dev pluginを選択し、Plugin idとしてtest pluginを選択。（ここで出てこない場合は、この記事の前準備ができていない。）
New plugin webapp idにcustom-scatterplotと記載して[CONVERT]をクリック。

画面が切り替わって、以下のような画面が表示されます。これは、pluginの開発用画面となります。散布図web appをプラグイン化したことで、自動的にプラグインのひな形が作成されました。ちなみに[Apps]>[Plugins]>[Development]>[custom-scatterplot]でも辿り着けます。

左側にこのプラグインのディレクトリ構造が示されています。今回のプラグインはtest-plugin/webapps/custom-scatterplot以下にファイルが置かれていることがわかります。ちなみに、test-plugin/webapps/custom-histgramは、この記事を実施したときのプラグインです。同じプラグインID(test-plugin)で作成したので、同じwebappsディレクトリ以下に置かれたようです。

webapp.jsonの書き換え

プラグイン化するため、内部でベタ書きで呼んでいるデータセット名やカラム名を変更していきます。
webapp.jsonのparamを以下のように書き換えます。

"params": [
    {
        "name": "input_dataset",
        "type": "DATASET",
        "label": "Dataset",
        "description": "The dataset used to populate the web app",
        "mandatory": true,
        "canSelectForeign": true
    },
    {
        "name": "x_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "X-Axis Column",
        "description": "",
        "mandatory": true
    },
    {
        "name": "y_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "Y-Axis Column",
        "description": "",
        "mandatory": true
    },
    {
        "name": "time_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "Time Column",
        "description": "",
        "mandatory": true
    },
    {
        "name": "cat_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "Categorical Column",
        "description": "",
        "mandatory": true
    }
],

rolesのコメントアウトを外します。

    "roles": [
         {"type": "DATASET", "targetParamsKey": "input_dataset"} 
    ]

これでパラメタの設定ができました。

backend.pyの書き換え

次にpythonコードの方を書き換えていきます。
backend.pyの「#入力パラメタの設定」を以下のように書き換えます。

# 入力パラメタの設定
input_dataset = get_webapp_config()['input_dataset']
x_column = get_webapp_config()['x_column']
y_column = get_webapp_config()['y_column']
time_column = get_webapp_config()['time_column']
cat_column = get_webapp_config()['cat_column']

これで書き換え完了です。
[SAVE ALL]をクリックし、[ACTIONS]>[Reload this plugin]をクリック。

これでプラグインができたので、動作確認します。

先ほどのプロジェクトのフローに戻って、Orders_enriched_preparedデータセットを選択。
Webappsのところに、custom scatterplotが、無い。

ブラウザをリロードしたらでてきました、Custom scatterplot。

Custom scatterplotを選択し、Weapp nameをデフォルトのままにCREATE。

Viewタブには何もでてこないので、Settingsタブで設定します。

値を設定して、[SAVE AND VIEW WEBAPP]をクリック。

先ほどと同じようなwidgetsつきの散布図が表示されました。

Settingsからデータセットや選択項目を変えることで、UIからwebappをカスタマイズすることができるようです。

参考までに、webapp.jsonとbackend.pyの全貌を載せておきます。

weapp.json

// This file is the descriptor for webapp custom-scatterplot
{
    "meta": {
        // label: name of the webapp as displayed, should be short
        "label": "Custom scatterplot",
        // description: longer string to help end users understand what this webapp does
        "description": "",
        // icon: must be one of the FontAwesome 3.2.1 icons, complete list here at https://fontawesome.com/v3.2.1/icons/
        "icon": "icon-puzzle-piece"
    },

    "baseType": "BOKEH", // WARNING: do not change
    "hasBackend": "true",
    "noJSSecurity": "true",
    "standardWebAppLibraries": null,

    /* The field "params" holds a list of all the params
       for wich the user will be prompted for values in the Settings tab of the webapp.

       The available parameter types include:
       STRING, STRINGS, INT, DOUBLE, BOOLEAN, SELECT, MULTISELECT, MAP, TEXTAREA, PRESET, DATASET, DATASET_COLUMN, MANAGED_FOLDER

       For the full list and for more details, see the documentation: https://doc.dataiku.com/dss/latest/plugins/reference/params.html
    */
"params": [
    {
        "name": "input_dataset",
        "type": "DATASET",
        "label": "Dataset",
        "description": "The dataset used to populate the web app",
        "mandatory": true,
        "canSelectForeign": true
    },
    {
        "name": "x_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "X-Axis Column",
        "description": "",
        "mandatory": true
    },
    {
        "name": "y_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "Y-Axis Column",
        "description": "",
        "mandatory": true
    },
    {
        "name": "time_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "Time Column",
        "description": "",
        "mandatory": true
    },
    {
        "name": "cat_column",
        "type": "DATASET_COLUMN",
        "datasetParamName": "input_dataset",
        "label": "Categorical Column",
        "description": "",
        "mandatory": true
    }
],

    /* roles define where this webapp will appear in DSS GUI. They are used to pre-fill a macro parameter with context.

       Each role consists of:
        - type: where the macro will be shown
            * DATASET, DATASETS, SAVED_MODEL, MANAGED_FOLDER (a button to create webapps will be shown in the corresponding "action" menu)
        - targetParamsKey(s): name of the parameter(s) that will be filled with the selected object
    */
    "roles": [
         {"type": "DATASET", "targetParamsKey": "input_dataset"} 
    ]
}

backend.py

from dataiku.customwebapp import *

# Access the parameters that end-users filled in using webapp config
# For example, for a parameter called "input_dataset"
# input_dataset = get_webapp_config()["input_dataset"]

# ライブラリインポート
from bokeh.io import curdoc
from bokeh.layouts import row, widgetbox
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider, TextInput, Select
from bokeh.plotting import figure
import dataiku
import pandas as pd

# 入力パラメタの設定
input_dataset = get_webapp_config()['input_dataset']
x_column = get_webapp_config()['x_column']
y_column = get_webapp_config()['y_column']
time_column = get_webapp_config()['time_column']
cat_column = get_webapp_config()['cat_column']

#入力データをデータフレームに変換
mydataset = dataiku.Dataset(input_dataset)
df = mydataset.get_dataframe()

#x軸とy軸の値を取得し、可視化用にデータを整形
x = df[x_column]
y = df[y_column]
source = ColumnDataSource(data=dict(x=x, y=y))

# 描画の設定
plot = figure(plot_height=400, plot_width=400, title=y_column+" by "+x_column,
              tools="crosshair,pan,reset,save,wheel_zoom",
              x_range=[min(x), max(x)], y_range=[min(y),max(y)])

plot.scatter('x', 'y', source=source)

# widgetsの設定
text = TextInput(title="Title", value=y_column+" by "+x_column)
time = df[time_column]
min_year = Slider(title="Time start", value=min(time), start=min(time), end=max(time), step=1)
max_year = Slider(title="Time max", value=max(time), start=min(time), end=max(time), step=1)
cat_categories = df[cat_column].unique().tolist()
cat_categories.insert(0,'All')
category = Select(title="Category", value="All", options=cat_categories)

#updateとCallbackの設定
def update_title(attrname, old, new):
    #タイトルが変更されるとこの関数が呼び出され、plot.title.textを上書きする
    plot.title.text = text.value

# on_change methodで、タイトルの変更を感知
text.on_change('value', update_title)

def update_data(attrname, old, new):
    # スライドバーや選択widget(Tシャツカテゴリ)が変更されるとこの関数が呼び出される。
    # 選択した値に応じてdfをフィルタリングする。
    category_value = category.value
    selected = df[(time>=min_year.value) & (time<=max_year.value)]
    if (category_value != "All"):
        selected = selected[selected[cat_column].str.contains(category_value)==True]
    # Generate the new plot
    x = selected[x_column]
    y = selected[y_column]
    source.data = dict(x=x, y=y)
    
# on_change methodで、スライドバー等の変更を感知
for w in [min_year, max_year, category]:
    w.on_change('value', update_data)
    
# レイアウトと出力の設定
inputs = widgetbox(text, min_year, max_year, category)

curdoc().add_root(row(inputs, plot, width=800))

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up