More than 1 year has passed since last update.

有限会社来栖川電算

bokeh: `legend_label`引数には文字列以外を指定できない

Last updated at 2024-04-12Posted at 2024-04-12

環境

Python 3.12.1
bokeh 3.4.0
pandas 2.2.0

背景

scatter()メソッドのlegend_field引数を利用して、凡例を設定しています。

sample0.py

from bokeh.io import output_file, save
import pandas
from bokeh.plotting import figure, ColumnDataSource

from bokeh.palettes import Category10

colors = Category10[3]
df = pandas.DataFrame(
    {"x": [1, 2, 3, 4], "y": [1, 4, 9, 16], "type": ["X", "Y", "Y", "Z"]}
)
print(f"{df["type"]=}")
fig = figure()

for index, type in enumerate(df["type"].unique()):
    df2 = df[df["type"] == type]
    source = ColumnDataSource(df2)
    fig.scatter(
        source=source, x="x", y="y", legend_label=type, color=colors[index], size=10
    )

fig.legend.location = "top_left"
fig.legend.title = "Type"
fig.add_layout(fig.legend[0], "left")

output_file("output0.html")
save(fig)

$ python sample0.py
df["type"]=0    X
1    Y
2    Y
3    Z
Name: type, dtype: object

凡例に設定する値に欠損値が含まれる場合

`legend_label`引数を指定すると`ValueError`が発生する

legend_label引数の型はstrです。

legend_label (str, optional) –
Specify that the glyph should produce a single basic legend label in the legend. The legend entry is labeled with the exact text supplied here. ¹

したがって、legend_label引数に欠損値を指定するとValueErrorが発生しました。

df = pandas.DataFrame(
    {"x": [1, 2, 3, 4], "y": [1, 4, 9, 16], "type": ["X", "Y", "Y", float("nan")]}
)

$ python sample0.py
df["type"]=0      X
1      Y
2      Y
3    NaN
Name: type, dtype: object
...
ValueError: legend_label value must be a string

回避策: 欠損値を空文字にする

回避策というほどのものではありませんが、欠損値を空文字に置換すればValueErrorは発生しません。

df = pandas.DataFrame(
    {"x": [1, 2, 3, 4], "y": [1, 4, 9, 16], "type": ["X", "Y", "Y", float("nan")]}
)
df = df.fillna({"type": ""})

$ python sample0.py
df["type"]=0    X
1    Y
2    Y
3
Name: type, dtype: object

凡例に数値を設定する場合

`type`列のdtypeが`int64`

type列が数値の場合は、legend_labelではなくlegend_group引数を利用すれば、DataFrameを変換することなくグラフを描画できました。

sample02.py

df = pandas.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [1, 4, 9, 16],
        "type": [1, 2, 2, 3],
        "color": [colors[0], colors[1], colors[1], colors[2]],
    }
)
print(f"{df["type"]=}")
fig = figure()
source = ColumnDataSource(df)
fig.scatter(source=source, x="x", y="y", legend_group="type", color="color", size=10)

$ python sample02.py
df["type"]=0    1
1    2
2    2
3    3
Name: type, dtype: int64

`type`列のdtypeが`Int64`で`pd.NA`を含む

欠損値pd.NAが含まれていても描画できました。ただし、凡例の値は整数でなく少数表記でした。bokehでpd.NAがnp.nanに変換されているようにみえます。

sample02.py

df = pandas.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [1, 4, 9, 16],
        "type": [1, 2, 2, None],
        "color": [colors[0], colors[1], colors[1], colors[2]],
    }
)
df = df.astype({"type": "Int64"})

$ python sample02.py
df["type"]=0       1
1       2
2       2
3    <NA>
Name: type, dtype: Int64

補足

`legend_group`引数を指定すると、凡例の項目ごとに表示またミュートができない

以下のコードではfig.legend.click_policy = "mute"を指定しています。
しかし、凡例の項目1をクリックしても、すべてのプロットがミュートされるだけで、typeが1のプロットだけをミュートすることはできません。

sample04.py

from bokeh.io import output_file, save
import pandas
from bokeh.plotting import figure, ColumnDataSource

from bokeh.palettes import Category10

colors = Category10[3]
df = pandas.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [1, 4, 9, 16],
        "type": [1, 2, 2, 3],
        "color": [colors[0], colors[1], colors[1], colors[2]],
    }
)
df = df.astype({"type": "Int64"})
fig = figure()
source = ColumnDataSource(df)
fig.scatter(source=source, x="x", y="y", legend_group="type", color="color", size=10)

fig.legend.location = "top_left"
fig.legend.title = "Type"
fig.legend.click_policy = "mute"
fig.add_layout(fig.legend[0], "left")

output_file("output04.html")
save(fig)

欠損値を含むdtypeが`string`の列は、`legend_group`引数で凡例を設定できない

type列のdtypeはstringで欠損値pd.NAが含まれています。
legend_group引数にtypeを指定すると、TypeErrorが発生しました。

sample02.py

df = pandas.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [1, 4, 9, 16],
        "type": ["X", "Y", "Y", None],
        "color": [colors[0], colors[1], colors[1], colors[2]],
    }
)
df = df.astype({"type": "string"})

$ python:sample02.py
df["type"]=0       X
1       Y
2       Y
3    <NA>
Name: type, dtype: string
Traceback (most recent call last):
...
  File "/home/yuji/.pyenv/versions/3.12.1/lib/python3.12/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d
    perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "missing.pyx", line 392, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

type列のdtypeがstringでなくobjectでも、TypeErrorは発生しました。ただし、エラーメッセージは異なりました。

sample02.py

df = pandas.DataFrame(
    {
        "x": [1, 2, 3, 4],
        "y": [1, 4, 9, 16],
        "type": ["X", "Y", "Y", None],
        "color": [colors[0], colors[1], colors[1], colors[2]],
    }
)

$ python sample02.py
df["type"]=0       X
1       Y
2       Y
3    None
Name: type, dtype: object
Traceback (most recent call last):
...
File "/home/yuji/.pyenv/versions/3.12.1/lib/python3.12/site-packages/numpy/lib/arraysetops.py", line 333, in _unique1d
    perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'str' and 'NoneType'

bokehのissueで報告した方がよいかもしれません。

https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

bokeh: `legend_label`引数には文字列以外を指定できない

環境

背景

凡例に設定する値に欠損値が含まれる場合

legend_label引数を指定するとValueErrorが発生する

回避策: 欠損値を空文字にする

凡例に数値を設定する場合

type列のdtypeがint64

type列のdtypeがInt64でpd.NAを含む

補足

legend_group引数を指定すると、凡例の項目ごとに表示またミュートができない

欠損値を含むdtypeがstringの列は、legend_group引数で凡例を設定できない

`legend_label`引数を指定すると`ValueError`が発生する

`type`列のdtypeが`int64`

`type`列のdtypeが`Int64`で`pd.NA`を含む

`legend_group`引数を指定すると、凡例の項目ごとに表示またミュートができない

欠損値を含むdtypeが`string`の列は、`legend_group`引数で凡例を設定できない