More than 5 years have passed since last update.

Passing list-likes to .loc or [] with any missing label will raise KeyError in the future

Last updated at 2019-06-28Posted at 2019-06-28

はじめに

pandas を使っていたら、以下の警告（warning）が出て、将来的に（pandas 0.21.0以降で）はエラーになるよ、と書いてあるのでドキュメントをちゃんと読んでみた。

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
...
Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative.

該当箇所

# n_train はただの整数型の数値
# train_vars は df_train から必要な変数だけをとってくるために指定した文字列配列
X_train = df_train.loc[0:n_train, train_vars]

原因

ドキュメントから該当箇所を抜き出してみる。

In prior versions, using .loc[list-of-labels] would work as long as at least 1 of the keys was found (otherwise it would raise a KeyError). This behavior is deprecated and will show a warning message pointing to this section. The recommended alternative is to use .reindex().

【拙訳】
過去のバージョンでは、.loc[ラベルのリスト] は少なくとも 1つのキーが見つかる限りは正しく動作していた（1つも見つからなければ KeyError となった）。この振る舞いは非推奨となり、（ドキュメントの）このセクションを示す警告を表示する。代わりに推奨されるのは、.reindex()を使用することである。

つまり、

# こういうデータフレームに対して、
>>> df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [7,8,9]})
>>> df
   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9

# これは OK で、
>>> df.loc[:,['a']]
   a
0  1
1  2
2  3

# これはダメ
>>> df.loc[:,['a','d']]
__main__:1: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
/Users/chase0213/.pyenv/versions/3.6.2/lib/python3.6/site-packages/pandas/core/indexing.py:1367: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)
   a   d
0  1 NaN
1  2 NaN
2  3 NaN

ということ。存在しないキーに対して参照すると、過去のバージョンでは何もなかったかのように Nan を突っ込んで返していたところ、ちゃんとエラーになるようにしたよ、というもの。
何も出なかったら見逃して処理続けちゃうよね、良い変更だね！

というわけで、対応策を考える。

対応策

１．ちゃんと存在するキーを指定する

前述の通り、ちゃんと存在するキーを指定すれば問題ない。
ただし、データを結合したりして、どうしてもキーのない行、列を使いたいときはあるので、その際には次の reindex() を使う。

２． `reindex()`を使う

# 新しく 'd' というキーを追加する
>>> labels = ['a','b','c','d']

# axis=1 に対して、labels を新しくキーに設定する
>>> df.reindex(labels, axis=1)
   a  b  c   d
0  1  4  7 NaN
1  2  5  8 NaN
2  3  6  9 NaN

# 何もエラーがでなくなる
>>> df.loc[:, ['a','d']]
   a   d
0  1 NaN
1  2 NaN
2  3 NaN

結論

deprecated になる箇所にはちゃんと意味があるので、警告だからといって無視せずにちゃんと理解して使おう（自戒）。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Passing list-likes to .loc or [] with any missing label will raise KeyError in the future

はじめに

該当箇所

原因

対応策

１． ちゃんと存在するキーを指定する

２． reindex()を使う

結論

１．ちゃんと存在するキーを指定する

２． `reindex()`を使う