More than 1 year has passed since last update.

有限会社来栖川電算

pandas: `fillna()`など`inplace`引数があるメソッドの戻り値に対して`[key]`で情報を取得すると、Pylintの`unsubscriptable-object`エラーが発生する

Last updated at 2024-05-01Posted at 2024-05-01

環境

Python 3.12.1
pylint 3.1.0

何が起きたのか

以下のコードは、pandas.DataFrameに対して[column]で列の情報を取得しています。
pylintでコードをチェックしたところ、E1136というエラーが発生しました。

sample1.py

import pandas as pd

df = pd.DataFrame({"a": [1, None], "b": [None, 4]})
# OK
print(df["a"])

df2 = df.fillna(0)
# error:  Value 'df2' is unsubscriptable
print(df2["a"])

$ pylint sample1.py --disable W,C
************* Module sample1
sample1.py:9:6: E1136: Value 'df2' is unsubscriptable (unsubscriptable-object)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)

変数dfとdf2の型はどちらもpandas.DataFrameです。しかし、fillna()の戻り値であるdf2に対して[column]で列の情報を取得したときのみ、"Value 'df2' is unsubscriptable"というエラーが発生しました。

検証したこと

`fillna()`の戻り値に対して`[column]`に代入する

sample2.py

import pandas as pd

df = pd.DataFrame({"a": [1, None], "b": [None, 4]})

df2 = df.fillna(0)
# error:  'df2' does not support item assignment
df2["c"] = 0

$ pylint sample2.py --disable C,W
************* Module sample2
sample2.py:7:0: E1137: 'df2' does not support item assignment (unsupported-assignment-operation)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)

E1137エラーが発生しました。

`pandas.Sereis`で試す

sample3.py

import pandas as pd

s = pd.Series([1,2], index=["a","b"])
# OK
print(s["a"])

s2 = s.fillna(0)
# error:  Value 's2' is unsubscriptable
print(s2["a"])

$ pylint sample3.py --disable C,W
************* Module sample3
sample3.py:9:6: E1136: Value 's2' is unsubscriptable (unsubscriptable-object)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)

pandas.SeriesでもE1137エラーが発生しました。

fillna以外のメソッドを試す

sample4.py

import pandas as pd

df = pd.DataFrame({"a": [1, None], "b": [None, 4]})

df2 = df.drop("a")
# error:  Value 'df2' is unsubscriptable
print(df2["a"])

df3 = df.reset_index()
# error:  Value 'df3' is unsubscriptable
print(df3["a"])

df4 = df.replace({1:111})
# error:  Value 'df4' is unsubscriptable
print(df4["a"])

df5 = df + df
# OK
print(df5["a"])

df6 = df.add(df)
# OK
print(df6["a"])

df7 = df.isna()
# OK
print(df7["a"])

1$ pylint sample4.py --disable W,C
************* Module sample4
sample4.py:7:6: E1136: Value 'df2' is unsubscriptable (unsubscriptable-object)
sample4.py:11:6: E1136: Value 'df3' is unsubscriptable (unsubscriptable-object)
sample4.py:15:6: E1136: Value 'df4' is unsubscriptable (unsubscriptable-object)

------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)

drop()やreset_index()でも同様のE1136エラーが発生しました。
しかし、isna()やadd()ではエラーは発生しませんでした。

`fillna()`の戻り値に対して`[column]`以外で情報を取得する。

sample5.py

import pandas as pd

df = pd.DataFrame({"a": [1, None], "b": [None, 4]})
df2 = df.fillna(0)
# OK
print(df2.loc[:, "a"])

# OK
print(df2.fillna(0))

# OK
print(len(df2))

$ pylint sample5.py --disable W,C

-------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 6.67/10, +3.33)

pylintのエラーは発生しませんでした。

DataFrameクラスには__len__()が定義されているので、len(df2)もエラーが発生すると思ったのですが、エラーは発生しませんでした。

考察

エラーが発生したfillna(), drop(), reset_index()メソッドには、inplace引数が存在します。このことから、inplace引数が存在するメソッドの戻り値に対して大括弧[]で情報を取得または代入する際に、pylintでエラーが発生すると考えました。

なお、fillna()はoverloadデコレータを使って定義されています。

pandas/core/generic.py

    @overload
    def fillna(
        self,
        value: Hashable | Mapping | Series | DataFrame,
        *,
        axis: Axis | None = ...,
        inplace: Literal[False] = ...,
        limit: int | None = ...,
    ) -> Self: ...

    @overload
    def fillna(
        self,
        value: Hashable | Mapping | Series | DataFrame,
        *,
        axis: Axis | None = ...,
        inplace: Literal[True],
        limit: int | None = ...,
    ) -> None: ...

    @overload
    def fillna(
        self,
        value: Hashable | Mapping | Series | DataFrame,
        *,
        axis: Axis | None = ...,
        inplace: bool = ...,
        limit: int | None = ...,
    ) -> Self | None: ...

以下のsample11.pyでは、overloadデコレータを使って、fillna()のシグネチャに似たメソッドFoo.get2()を定義しました。
sample11.pyをpylintでチェックすると、Foo.get2()の戻り値に対してE1136エラーが発生しました。
また、Foo.get3()の中身はFoo.get2()と同じですが、overloadデコレータを利用していません。Foo.get3()の戻り値に対してはE1136エラーは発生しませんでした。

sample11.py

from typing import Self, overload, Literal


class Foo:
    def __init__(self, data: dict[str, int]) -> None:
        self.data = data

    def __getitem__(self, key: str) -> int:
        return self.data[key]

    def get1(self) -> Self:
        return self

    @overload
    def get2(
        self,
        *,
        inplace: Literal[False],
    ) -> Self: ...

    @overload
    def get2(
        self,
        *,
        inplace: Literal[True],
    ) -> None: ...

    def get2(
        self,
        *,
        inplace: bool = False,
    ) -> Self | None:
        if inplace:
            self.data = {key: value * 2 for key, value in self.data.items()}
            return None
        return self.__class__({key: value * 2 for key, value in self.data.items()})

    def get3(
        self,
        *,
        inplace: bool = False,
    ) -> Self | None:
        if inplace:
            self.data = {key: value * 2 for key, value in self.data.items()}
            return None
        return self.__class__({key: value * 2 for key, value in self.data.items()})


f = Foo({"a": 1})
# OK
print(f["a"])
f1 = f.get1()
# OK
print(f1["a"])

f2 = f.get2(inplace=False)
# error: Value 'f2' is unsubscriptable
print(f2["a"])

f3 = f.get3(inplace=False)
# OK
print(f3["a"])

$ pylint sample11.py --disable W,C
************* Module sample11
sample11.py:58:6: E1136: Value 'f2' is unsubscriptable (unsubscriptable-object)

------------------------------------------------------------------
Your code has been rated at 8.33/10 (previous run: 7.83/10, +0.51)

以上のことから、pylintはoverloadデコレータを正しく認識できないようです。

pylintのissueを見ると、

'@overload' is not yet supported.

とのコメントがありました。

まとめ

以下、分かったことです。

pandasのfillna()やdrop()などinplace引数を持つメソッドの戻り値に対して、大括弧[]で情報を取得する（__getitem__()）と、pylintのE1136が発生する
- 大括弧[]で情報を設定する（__setitem__()）とE1137エラーが発生する
fillna()などはoverloadデコレータで定義されている。pylintはoverloadデコレータに対応していない。したがって、__getitem__/__setitem__は実装されているのにも関わらず、E1136,E1137エラーが発生するのかもしれない

以下、分からなかったことです。

pandasのfillna()の戻り値に対して、len()を実行しても（DataFrameクラスには__len__()が定義されている）pylintでエラーが発生しなかった。なぜか？
overloadデコレータでメソッドを定義すると、なぜpylintのE1136やE1137`のエラーが発生するのか？

補足

`inplace=True`を指定するのは非推奨

fillna()にinpace=Trueを指定すれば、pylintののE1136エラーを回避できます。

sample.py

import pandas as pd

df = pd.DataFrame({"a": [1, None], "b": [None, 4]})
# pylint
のE1136 エラーを回避するため、`inplace=True`を指定
# df = df.fillna(0)
df.fillna(0, inplace=True)
print(df["a"])

しかし、inplace=Trueを指定すると、Ruffのpandas-use-of-inplace-argumentに引っ掛かります。

inplace=Trueを指定してもパフォーマンス上の利点はないようです。したがって、inplace=Falseを指定してpylintのエラーを無視する方が良いでしょう。

Ruffによるpylintのルールの実装

Ruffはpylintのルールを徐々に実装しています。

2024/05/01時点では、E1136, E1137は実装されていません。
これらのルールが実装されたとき、今回のような挙動になるのかどうかが気になります。

参考にしたサイト

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up