2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). に対処する

Posted at

scikit-learn を使うと

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

というエラーに遭遇して、データのどこにエラーがあるか追及するのに苦労すること、ありませんか?(あるある)

問題設定

pandas で次のようなデータを用いるとします。

X.shape, Y.shape
((1318, 400), (1318,))

普通にscikit-learnを使うと、次のようなエラーが出るとします。

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(X, Y)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-9-55fca59c1791> in <module>()
      2 
      3 model = RandomForestRegressor()
----> 4 model.fit(X, Y)


/usr/local/lib/python3.7/dist-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
    293         """
    294         # Validate or convert input data
--> 295         X = check_array(X, accept_sparse="csc", dtype=DTYPE)
    296         y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)
    297         if sample_weight is not None:


/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    576         if force_all_finite:
    577             _assert_all_finite(array,
--> 578                                allow_nan=force_all_finite == 'allow-nan')
    579 
    580     if ensure_min_samples > 0:


/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
     58                     msg_err.format
     59                     (type_err,
---> 60                      msg_dtype if msg_dtype is not None else X.dtype)
     61             )
     62     # for object dtype data, we only check for NaNs (GH-13254)


ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

解決方法

1列ずつ取り出して学習して、エラーを起こさない列だけ集めます。

success_col = []
for i in range(X.shape[1]):
    try:
        model = RandomForestRegressor()
        model.fit(X.iloc[:, [i]], Y)
        success_col.append(i)
    except:
        continue

エラーを起こさなかった列だけを抜き出します。

X = X.iloc[:, success_col]

列の数が減りましたね。

X.shape
(1318, 392)

今度は学習できるはずです。

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(X, Y)
model.score(X, Y)
0.897965478410951

なんなら、予測スコアがある程度高い列だけ抜き出したら効率的かも。

success_col = []
for i in range(X.shape[1]):
    try:
        model = RandomForestRegressor()
        model.fit(X.iloc[:, [i]], Y)
        if model.score(X.iloc[:, [i]], Y) > 0.1:
            success_col.append(i)
    except:
        continue
X = X.iloc[:, success_col]
X.shape
(1318, 272)
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(X, Y)
model.score(X, Y)
0.8996226954522588

完。

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?