More than 3 years have passed since last update.

sklearn準拠モデル作成をしてGridSearchCVを行いたい時にハマったこと

sklearn

Last updated at 2020-12-28Posted at 2020-02-15

ちょこっと詰まったことのメモ

自作するクラスの__init__()において、
引数とインスタンス変数の名前を揃えないとエラーが出ちゃいます.
例えば、以下のような歪みのあるコイン投げで二値分類を行うようなsklearn準拠モデルを作成した時、self.p = thetaみたいな感じで引数とインスタンス変数の名前を揃えないと以下のようなエラーが出ます.


from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.model_selection import GridSearchCV
import numpy as np


class MyClassifer(BaseEstimator, ClassifierMixin):    
    def __init__(self, theta=0.5):
        self.p = theta

    def fit(self, X, y):
        return self

    def predict(self, X):
        return np.random.binomial(size=X.shape[0], n=1, p=self.p)

    def score(self, X, y):
        return (sum(self.predict(X)==y)/len(y))

X = np.random.normal(0, 1, (100, 10))
y = np.ones(X.shape[0])

params = {'theta':list(np.arange(0,1.1,0.1))}

model = MyClassifer()
gridsearch = GridSearchCV(model, params, cv=3, verbose=0)
gridsearch.fit(X,y)

print("best theta:{}".format(gridsearch.best_params_))
print("score:{}".format(gridsearch.score(X,y)))

<ipython-input-9-e4c006ec5090> in predict(self, X)
     12 
     13     def predict(self, X):
---> 14         return np.random.binomial(size=X.shape[0], n=1, p=self.p)
     15 
     16     def score(self, X, y):

mtrand.pyx in mtrand.RandomState.binomial()

TypeError: must be real number, not NoneType

確認すると、


model = MyClassifer(theta=0.7)
print("score:{}".format(model.score(X,y)))
print("theta:{}".format(model.get_params()))


score:0.77
theta:{'theta': None}

thetaがnoneになってますね。

ちゃんと揃えてあげて実行すると、ラベルを全て1としているのでtheta=1がGridSearchの結果正しく確定されています

from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.model_selection import GridSearchCV
import numpy as np


class MyClassifer(BaseEstimator, ClassifierMixin):    
    def __init__(self, theta=0.5):
        self.theta = theta

    def fit(self, X, y):
        return self

    def predict(self, X):
        return np.random.binomial(size=X.shape[0], n=1, p=self.theta)

    def score(self, X, y):
        return (sum(self.predict(X)==y)/len(y))

X = np.random.normal(0, 1, (100, 10))
y = np.ones(X.shape[0])

params = {'theta':list(np.arange(0,1.1,0.1))}

model = MyClassifer()
gridsearch = GridSearchCV(model, params, cv=3, verbose=0)
gridsearch.fit(X,y)

print("best theta:{}".format(gridsearch.best_params_))
print("score:{}".format(gridsearch.score(X,y)))

best theta:{'theta': 1.0}
score:1.0

sklearn準拠モデルを作成するときは、BaseEstimatorを継承しますが、documentまたはソースコードに以下のことが書かれています。

All estimators should specify all the parameters that can be set at the class level in their ``__init__`` as explicit keyword arguments (no ``*args`` or ``**kwargs``).

上記プラス引数とインスタンス変数の名前を揃えることに気をつける

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up