bokehのMultiChoice Widgetを使って、散布図の特定のプロットを動的に目立たせる

Last updated at 2024-03-18Posted at 2024-03-14

環境

Python 3.12.1
bokeh 3.0.3

やりたいこと

アカウントごとの作業時間とスコアの関係を散布図で表したいです。

以下のPythonファイルで散布図を生成しました。

sample.py

from bokeh.io import output_file, save
import pandas
from bokeh.plotting import figure, ColumnDataSource
from bokeh.palettes import Category10

df = pandas.read_csv("data.csv")
print(f"{df.dtypes=}")
print(f"{len(df)=}")
fig = figure(
    width=600,
    height=400,
    x_axis_label="worktime",
    y_axis_label="score",
)

colors = {"X": Category10[10][0], "Y": Category10[10][1], "Z": Category10[10][2]}

for type in ["X", "Y", "Z"]:
    df2 = df[df["type"] == type]
    source = ColumnDataSource(df2)
    fig.circle(
        source=source,
        x="worktime",
        y="score",
        legend_label=type,
        size=4,
        color=colors[type],
    )
    fig.text(
        source=source,
        x="worktime",
        y="score",
        text="account_id",
        legend_label=type,
        text_font_style="normal",
        text_font_size="7pt",
    )

fig.legend.location = "top_left"
fig.legend.click_policy = "mute"
fig.legend.title = "Type"
# グラフ外に凡例を表示する
fig.add_layout(fig.legend[0], "left")

output_file("output1.html")
save(fig)

data.csvの中身の一部です。

data.csv

account_id,worktime,score,type
019b,93.03278985,0.349856935,X
05ca,19.73813326,0.25262116,X
0d2b,29.21835098,0.166359941,X

$ python sample.py
len(df)=47
df.dtypes=account_id     object
worktime      float64
score         float64
type           object
dtype: object

問題なく表示できました。

特定のアカウントを探しやすくする

プロットするアカウントが多いと、特定のアカウントがどこにいるのかを探すのが大変です。
そこで、bokehのMultiChoice Widgetを使って、特定のアカウントを探しやすくします。
具体的には、MultiChoiceで選択されたアカウントの円形サイズを大きくして、フォントスタイルを太字にします。

sample2.py

from bokeh.io import output_file, save
from bokeh.models import CustomJS, MultiChoice
import pandas
from bokeh.plotting import figure
from bokeh.palettes import Category10

df = pandas.read_csv("data.csv")
fig = figure(
    width=600,
    height=400,
    x_axis_label="worktime",
    y_axis_label="score",
)

circle_glyphs = {}
text_glyphs = {}

colors = {"X": Category10[10][0], "Y": Category10[10][1], "Z": Category10[10][2]}

for worktime, score, account_id, type in zip(
    df["worktime"], df["score"], df["account_id"], df["type"]
):
    circle_glyphs[account_id] = fig.circle(
        x=worktime, y=score, legend_label=type, size=4, color=colors[type]
    )
    text_glyphs[account_id] = fig.text(
        x=[worktime],
        y=[score],
        text=[account_id],
        legend_label=type,
        text_font_style="normal",
        text_font_size="7pt",
    )


fig.legend.location = "top_left"
fig.legend.click_policy = "mute"
fig.legend.title = "Type"
# グラフ外に凡例を表示する
fig.add_layout(fig.legend[0], "left")

args = {"circleGlyphs": circle_glyphs, "textGlyphs": text_glyphs}

code = """
const selectedAccountIds = this.value;
for (let accountId in textGlyphs) {
    if (selectedAccountIds.includes(accountId)) {
        textGlyphs[accountId].glyph.text_font_style='bold';
        circleGlyphs[accountId].glyph.size = 8;
    } else {
        textGlyphs[accountId].glyph.text_font_style='normal';
        circleGlyphs[accountId].glyph.size = 4;

    }
}
"""

multi_choice = MultiChoice(options=list(df["account_id"]), title="Find account:")
multi_choice.js_on_change(
    "value",
    CustomJS(code=code, args=args),
)

output_file("output2.html")
save([fig, multi_choice])

ポイントは、fig.circle(), fig.text()をアカウントごとに実行して、その戻り値を保持している部分です。
アカウントごとにスタイルを切り替えるため、このような処理が必要になります。

課題

ファイルサイズが大きい

プロットする数に比例してファイルサイズが大きくなります。
今回のデータ（49件）では、output1.htmlが17KBなのに対してoutput2.htmlは148KBでした。

HTMLファイルの出力に時間がかかる

bokeh 3.0.3では、sample2.pyの実行時間はsample1.pyの10倍以上でした。

$ time python sample1.py

real    0m1.564s
user    0m2.023s
sys     0m0.556s

$ time python sample2.py

real    0m19.713s
user    0m20.005s
sys     0m0.594s

しかし、bokeh3.3.4ではsample2.pyの実行時間はsample1.pyの2～3倍でした。

$ time python sample1.py

real    0m1.401s
user    0m1.650s
sys     0m0.716s

$ time python sample2.py
real    0m4.003s
user    0m4.379s
sys     0m0.563s

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up