More than 3 years have passed since last update.

pandasでハマった話

Python

Last updated at 2021-10-26Posted at 2021-10-24

Pythonには配列系が4種類あって、それぞれ使い方が微妙に異なるため、非常に混乱する。

list
numpy.ndarray
pandas.core.series.Series
pandas.core.frame.DataFrame

最近、ハマった例は以下の通り。

>>> import pandas as pd
>>>
>>> df = pd.DataFrame()
>>> df['x'] = [10, 11, 12]
>>> df
    x
0  10
1  11
2  12
>>>
>>> 10 in df.x
False
>>> 10 == df.x[0]
True
>>> 10 in df.x.tolist()
True
>>> 10 in df.x.to_numpy()
True
>>> type(df.x)
<class 'pandas.core.series.Series'>

Series型に対するinは、値の存在チェックではなく、インデックスの存在をチェックする。
元々のPythonの文法の意味を変えてしまっているのでは。

DataFrameとSeriesの使用パターン

import pandas as pd

df = pd.DataFrame()
df['i'] = range(10,15)
df['x'] = range(5)
df['y'] = list('abcde')
df.set_index('i', inplace=True, drop=False)
df

df[0:2]  # 行番号 (2は含まない)
df['x']  # 列名 (Series)
df[['x']]  # 列名
df[['x','y']]  # 列名
df.loc[10]  # インデックス
df.loc[10:12]  # インデックス (12を含む)
df.iloc[0]  # 行番号
df.iloc[0:2]  # 行番号 (2は含まない)

df.x[10]  # インデックス
df.x[0:2]  # 行番号 (2は含まない)
df.x.loc[10] # インデックス
df.x.loc[10:12] # インデックス (12を含む)
df.x.iloc[0] # 行番号
df.x.iloc[0:2] # 行番号 (2は含まない)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up