60
41

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

最適なパラメータを効率的に探索しちゃおっちゅうな(Optuna)

Last updated at Posted at 2019-12-18

グリッドサーチよりも良いと言われるハイパーパラメータ探索手法 Optuna を私も試しちゃおっちゅうな話です。以下のコードは全て Google Colaboratory 上で動かしましたっちゅうなことです。

Optuna のインストール

Google Colaboratory 上に Optuna はなかったので pip install しました。簡単。

!pip install optuna
Collecting optuna
[?25l  Downloading https://files.pythonhosted.org/packages/d4/6a/4d80b3014797cf318a5252afb27031e9e7502854fb7930f27db0ee10bb75/optuna-0.19.0.tar.gz (126kB)
[K     |████████████████████████████████| 133kB 4.9MB/s 
[?25hCollecting alembic
[?25l  Downloading https://files.pythonhosted.org/packages/84/64/493c45119dce700a4b9eeecc436ef9e8835ab67bae6414f040cdc7b58f4b/alembic-1.3.1.tar.gz (1.1MB)
[K     |████████████████████████████████| 1.1MB 42.4MB/s 
[?25hCollecting cliff
[?25l  Downloading https://files.pythonhosted.org/packages/f6/a9/e976ba91e57043c4b6add2c394e6d1ffc26712c694379c3fe72f942d2440/cliff-2.16.0-py2.py3-none-any.whl (78kB)
[K     |████████████████████████████████| 81kB 8.5MB/s 
[?25hCollecting colorlog
  Downloading https://files.pythonhosted.org/packages/68/4d/892728b0c14547224f0ac40884e722a3d00cb54e7a146aea0b3186806c9e/colorlog-4.0.2-py2.py3-none-any.whl
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from optuna) (1.17.4)
Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from optuna) (1.3.3)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from optuna) (1.12.0)
Requirement already satisfied: sqlalchemy>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from optuna) (1.3.11)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from optuna) (4.28.1)
Requirement already satisfied: typing in /usr/local/lib/python3.6/dist-packages (from optuna) (3.6.6)
Collecting Mako
[?25l  Downloading https://files.pythonhosted.org/packages/b0/3c/8dcd6883d009f7cae0f3157fb53e9afb05a0d3d33b3db1268ec2e6f4a56b/Mako-1.1.0.tar.gz (463kB)
[K     |████████████████████████████████| 471kB 37.3MB/s 
[?25hCollecting python-editor>=0.3
  Downloading https://files.pythonhosted.org/packages/c6/d3/201fc3abe391bbae6606e6f1d598c15d367033332bd54352b12f35513717/python_editor-1.0.4-py3-none-any.whl
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.6/dist-packages (from alembic->optuna) (2.6.1)
Requirement already satisfied: PyYAML>=3.12 in /usr/local/lib/python3.6/dist-packages (from cliff->optuna) (3.13)
Collecting cmd2!=0.8.3,<0.9.0,>=0.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/e9/40/a71caa2aaff10c73612a7106e2d35f693e85b8cf6e37ab0774274bca3cf9/cmd2-0.8.9-py2.py3-none-any.whl (53kB)
[K     |████████████████████████████████| 61kB 7.7MB/s 
[?25hCollecting pbr!=2.1.0,>=2.0.0
[?25l  Downloading https://files.pythonhosted.org/packages/7a/db/a968fd7beb9fe06901c1841cb25c9ccb666ca1b9a19b114d1bbedf1126fc/pbr-5.4.4-py2.py3-none-any.whl (110kB)
[K     |████████████████████████████████| 112kB 36.3MB/s 
[?25hRequirement already satisfied: pyparsing>=2.1.0 in /usr/local/lib/python3.6/dist-packages (from cliff->optuna) (2.4.5)
Collecting stevedore>=1.20.0
[?25l  Downloading https://files.pythonhosted.org/packages/b1/e1/f5ddbd83f60b03f522f173c03e406c1bff8343f0232a292ac96aa633b47a/stevedore-1.31.0-py2.py3-none-any.whl (43kB)
[K     |████████████████████████████████| 51kB 6.5MB/s 
[?25hRequirement already satisfied: PrettyTable<0.8,>=0.7.2 in /usr/local/lib/python3.6/dist-packages (from cliff->optuna) (0.7.2)
Requirement already satisfied: MarkupSafe>=0.9.2 in /usr/local/lib/python3.6/dist-packages (from Mako->alembic->optuna) (1.1.1)
Requirement already satisfied: wcwidth; sys_platform != "win32" in /usr/local/lib/python3.6/dist-packages (from cmd2!=0.8.3,<0.9.0,>=0.8.0->cliff->optuna) (0.1.7)
Collecting pyperclip
  Downloading https://files.pythonhosted.org/packages/2d/0f/4eda562dffd085945d57c2d9a5da745cfb5228c02bc90f2c74bbac746243/pyperclip-1.7.0.tar.gz
Building wheels for collected packages: optuna, alembic, Mako, pyperclip
  Building wheel for optuna (setup.py) ... [?25l[?25hdone
  Created wheel for optuna: filename=optuna-0.19.0-cp36-none-any.whl size=170198 sha256=fdc7777d7454f3419bc9acfd4f83f5cf6f23f0d6a6f392fc744afb597484f156
  Stored in directory: /root/.cache/pip/wheels/49/bf/47/090a43457caeff74397397da1c98a8aaed685257c16a5ba1f0
  Building wheel for alembic (setup.py) ... [?25l[?25hdone
  Created wheel for alembic: filename=alembic-1.3.1-py2.py3-none-any.whl size=144523 sha256=c66d5c3c4bd291757c2136352ac8d3cab450cccd0cb1005fe01211c5fa7576f4
  Stored in directory: /root/.cache/pip/wheels/b2/d4/19/5ab879d30af7cbc79e6dcc1d421795b1aa9d78f455b0412ef7
  Building wheel for Mako (setup.py) ... [?25l[?25hdone
  Created wheel for Mako: filename=Mako-1.1.0-cp36-none-any.whl size=75363 sha256=66ee5267f833ecdf52af2f6c7e8c93bb317a780d609f0765a8101383516ab29b
  Stored in directory: /root/.cache/pip/wheels/98/32/7b/a291926643fc1d1e02593e0d9e247c5a866a366b8343b7aa27
  Building wheel for pyperclip (setup.py) ... [?25l[?25hdone
  Created wheel for pyperclip: filename=pyperclip-1.7.0-cp36-none-any.whl size=8359 sha256=7e62cb6b9e2dcb8a323251caa96de1064b10e0a80baaf6b33b2052fafa34c08e
  Stored in directory: /root/.cache/pip/wheels/92/f0/ac/2ba2972034e98971c3654ece337ac61e546bdeb34ca960dc8c
Successfully built optuna alembic Mako pyperclip
Installing collected packages: Mako, python-editor, alembic, pyperclip, cmd2, pbr, stevedore, cliff, colorlog, optuna
Successfully installed Mako-1.1.0 alembic-1.3.1 cliff-2.16.0 cmd2-0.8.9 colorlog-4.0.2 optuna-0.19.0 pbr-5.4.4 pyperclip-1.7.0 python-editor-1.0.4 stevedore-1.31.0

インストールがうまくいったようなので、次の import 文を命令して、動作確認。

import optuna

準備体操

まずは Optuna の動作を理解するための準備体操から。

1変数の関数の最小化

試しに $f(x) = x^4 - 4x^3 - 36x^2$ を最小化してみます。

def f(x):
    return x**4 - 4 * x ** 3 - 36 * x ** 2

Optunaでは、最小化したい目的関数は次のように定義します。

def objective(trial):
    x = trial.suggest_uniform('x', -10, 10)
    return f(x)

次のようにすれば、10回だけ試行してくれます。

study = optuna.create_study()
study.optimize(objective, n_trials=10)
[32m[I 2019-12-13 00:32:08,911][0m Finished trial#0 resulted in value: 2030.566599827237. Current best value is 2030.566599827237 with parameters: {'x': 9.815207070259166}.[0m
[32m[I 2019-12-13 00:32:09,021][0m Finished trial#1 resulted in value: 1252.1813135138896. Current best value is 1252.1813135138896 with parameters: {'x': 9.366926766768199}.[0m
[32m[I 2019-12-13 00:32:09,147][0m Finished trial#2 resulted in value: -283.8813965725701. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,278][0m Finished trial#3 resulted in value: 1258.1505983061907. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,409][0m Finished trial#4 resulted in value: -59.988164166655146. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,539][0m Finished trial#5 resulted in value: 6493.216295606622. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,670][0m Finished trial#6 resulted in value: 233.47766027651414. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,797][0m Finished trial#7 resulted in value: -32.56782816991587. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:09,924][0m Finished trial#8 resulted in value: 9713.778056852296. Current best value is -283.8813965725701 with parameters: {'x': 2.6795376432294855}.[0m
[32m[I 2019-12-13 00:32:10,046][0m Finished trial#9 resulted in value: -499.577141711988. Current best value is -499.577141711988 with parameters: {'x': 3.6629193285453887}.[0m

試行回数を確認すると

len(study.trials)
10

目的関数を最小化するパラメータは次のようにして確認できます。

study.best_params
{'x': 3.6629193285453887}

目的関数の最小値は次のようにして得られます。

study.best_value
-499.577141711988

最小値を得た試行の情報はこのようにして得られます。

study.best_trial
FrozenTrial(number=9, state=TrialState.COMPLETE, value=-499.577141711988, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 926514), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 10, 46429), params={'x': 3.6629193285453887}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 9}, intermediate_values={}, trial_id=9)

試行の履歴はこのようにして見られます。

study.trials
[FrozenTrial(number=0, state=TrialState.COMPLETE, value=2030.566599827237, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 8, 821843), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 8, 911548), params={'x': 9.815207070259166}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 0}, intermediate_values={}, trial_id=0),
 FrozenTrial(number=1, state=TrialState.COMPLETE, value=1252.1813135138896, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 8, 912983), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 20790), params={'x': 9.366926766768199}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 1}, intermediate_values={}, trial_id=1),
 FrozenTrial(number=2, state=TrialState.COMPLETE, value=-283.8813965725701, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 22532), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 147430), params={'x': 2.6795376432294855}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 2}, intermediate_values={}, trial_id=2),
 FrozenTrial(number=3, state=TrialState.COMPLETE, value=1258.1505983061907, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 149953), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 277900), params={'x': 9.37074944280344}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 3}, intermediate_values={}, trial_id=3),
 FrozenTrial(number=4, state=TrialState.COMPLETE, value=-59.988164166655146, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 281543), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 409038), params={'x': -1.4636181925092284}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 4}, intermediate_values={}, trial_id=4),
 FrozenTrial(number=5, state=TrialState.COMPLETE, value=6493.216295606622, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 410813), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 539381), params={'x': -8.979003291609324}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 5}, intermediate_values={}, trial_id=5),
 FrozenTrial(number=6, state=TrialState.COMPLETE, value=233.47766027651414, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 542239), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 669699), params={'x': -5.01912242330347}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 6}, intermediate_values={}, trial_id=6),
 FrozenTrial(number=7, state=TrialState.COMPLETE, value=-32.56782816991587, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 671679), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 797441), params={'x': -1.027752432268013}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 7}, intermediate_values={}, trial_id=7),
 FrozenTrial(number=8, state=TrialState.COMPLETE, value=9713.778056852296, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 799124), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 9, 924648), params={'x': -9.843104909274034}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 8}, intermediate_values={}, trial_id=8),
 FrozenTrial(number=9, state=TrialState.COMPLETE, value=-499.577141711988, datetime_start=datetime.datetime(2019, 12, 13, 0, 32, 9, 926514), datetime_complete=datetime.datetime(2019, 12, 13, 0, 32, 10, 46429), params={'x': 3.6629193285453887}, distributions={'x': UniformDistribution(high=10, low=-10)}, user_attrs={}, system_attrs={'_number': 9}, intermediate_values={}, trial_id=9)]

では、追加で100回試行してみましょう。

study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:32:10,303][0m Finished trial#10 resulted in value: -679.8303609251094. Current best value is -679.8303609251094 with parameters: {'x': 4.482235669344949}.[0m
[32m[I 2019-12-13 00:32:10,447][0m Finished trial#11 resulted in value: -664.7000624843927. Current best value is -679.8303609251094 with parameters: {'x': 4.482235669344949}.[0m
[32m[I 2019-12-13 00:32:10,579][0m Finished trial#12 resulted in value: -778.9261500173968. Current best value is -778.9261500173968 with parameters: {'x': 5.024746639127292}.[0m
...(中略)...
[32m[I 2019-12-13 00:32:22,591][0m Finished trial#107 resulted in value: -760.7787838740135. Current best value is -863.9855798856751 with parameters: {'x': 6.011542730094907}.[0m
[32m[I 2019-12-13 00:32:22,724][0m Finished trial#108 resulted in value: -773.0113811629133. Current best value is -863.9855798856751 with parameters: {'x': 6.011542730094907}.[0m
[32m[I 2019-12-13 00:32:22,862][0m Finished trial#109 resulted in value: -577.7178004902428. Current best value is -863.9855798856751 with parameters: {'x': 6.011542730094907}.[0m

これで、試行回数は

len(study.trials)
110

目的関数を最小化するパラメータと、そのときの目的関数の値は

study.best_params, study.best_value
({'x': 6.011542730094907}, -863.9855798856751)

目的関数の値の履歴は次のように可視化できます。

%matplotlib inline
import matplotlib.pyplot as plt

plt.plot([trial.value for trial in study.trials])
plt.grid()
plt.show()

output_15_0.png

パラメータの履歴は次のように可視化できます。

%matplotlib inline
import matplotlib.pyplot as plt

plt.plot([trial.params['x'] for trial in study.trials])
plt.grid()
plt.show()

output_16_0.png

パラメータの探索がどのように行われたのか図示してみましょう。

%matplotlib inline
import matplotlib.pyplot as plt

plt.grid()
plt.plot([trial.params['x'] for trial in study.trials], 
         [trial.value for trial in study.trials],
         marker='x', alpha=0.3)
plt.scatter(study.trials[0].params['x'], study.trials[0].value, 
         marker='>', label='start', s=100)
plt.scatter(study.trials[-1].params['x'], study.trials[-1].value, 
         marker='s', label='end', s=100)
plt.scatter(study.best_params['x'], study.best_value,
         marker='o', label='best', s=100)
plt.xlabel('x')
plt.ylabel('y (value)')
plt.legend()
plt.show()

output_17_0.png

最小値が期待できなさそうな領域はそこそこにして、期待できそうな領域を重点的に探索したことが分かります。

2変数の関数の最小化

次は $f(x, y) = (x - 2.5)^2 + 2 (y + 2.5) ^ 2$ を最小化してみましょう。

def f(x, y):
    return (x - 2.5)**2 + 2 * (y + 2.5) ** 2

最小化したい関数の定義

def objective(trial):
    x = trial.suggest_uniform('x', -10, 10)
    y = trial.suggest_uniform('y', -10, 10)
    return f(x, y)

こうして100回試行して

study = optuna.create_study()
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:32:24,001][0m Finished trial#0 resulted in value: 31.229461850588567. Current best value is 31.229461850588567 with parameters: {'x': 7.975371679174145, 'y': -3.290495675347522}.[0m
[32m[I 2019-12-13 00:32:24,131][0m Finished trial#1 resulted in value: 158.84900024337801. Current best value is 31.229461850588567 with parameters: {'x': 7.975371679174145, 'y': -3.290495675347522}.[0m
[32m[I 2019-12-13 00:32:24,252][0m Finished trial#2 resulted in value: 118.67648241872055. Current best value is 31.229461850588567 with parameters: {'x': 7.975371679174145, 'y': -3.290495675347522}.[0m
...(中略)...
[32m[I 2019-12-13 00:32:37,321][0m Finished trial#97 resulted in value: 24.46020780084274. Current best value is 0.2114497716311141 with parameters: {'x': 2.286816304129357, 'y': -2.788099360851467}.[0m
[32m[I 2019-12-13 00:32:37,471][0m Finished trial#98 resulted in value: 15.832787347997524. Current best value is 0.2114497716311141 with parameters: {'x': 2.286816304129357, 'y': -2.788099360851467}.[0m
[32m[I 2019-12-13 00:32:37,625][0m Finished trial#99 resulted in value: 0.6493005675217599. Current best value is 0.2114497716311141 with parameters: {'x': 2.286816304129357, 'y': -2.788099360851467}.[0m

目的関数を最小化するパラメータと、そのときの最小値

study.best_params, study.best_value
({'x': 2.286816304129357, 'y': -2.788099360851467}, 0.2114497716311141)

目的関数の値とパラメータの履歴

%matplotlib inline
import matplotlib.pyplot as plt

plt.plot([trial.value for trial in study.trials], label='value')
plt.grid()
plt.legend()
plt.show()

plt.plot([trial.params['x'] for trial in study.trials], label='x')
plt.plot([trial.params['y'] for trial in study.trials], label='y')
plt.grid()
plt.legend()
plt.show()

output_23_0.png

output_23_1.png

パラメータの履歴を二次元平面上に図示してみましょう。

%matplotlib inline
import matplotlib.pyplot as plt

plt.plot([trial.params['x'] for trial in study.trials], 
         [trial.params['y'] for trial in study.trials],
         alpha=0.4, marker='x')
plt.scatter(study.trials[0].params['x'], study.trials[0].params['y'], 
         marker='>', label='start', s=100)
plt.scatter(study.trials[-1].params['x'], study.trials[-1].params['y'], 
         marker='s', label='end', s=100)
plt.scatter(study.best_params['x'], study.best_params['y'],
         marker='o', label='best', s=100)
plt.grid()
plt.legend()
plt.show()

output_24_0.png

ここでも、最小値が期待できなさそうな領域はそこそこにして、期待できそうな領域を重点的に探索したことが分かります。

以上、Optunaの動作を簡単に把握するための準備体操でした。

教師あり機械学習への応用

ここから、これを教師あり機械学習のハイパーパラメーターチューニングに応用します。

乳がんデータセット

機械学習ライブラリ Scikit-learn の breast cancer datasets を例に用い、説明変数 $X$ と目的変数 $y$ を次のように得ます。

# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target.ravel()

訓練データとテストデータに分割します。

from sklearn.model_selection import train_test_split 
# 訓練データ・テストデータへ6:4の比でランダムに分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4) 

比較対象としてのグリッドサーチ

Optuna との比較として、グリッドサーチの復習をします。グリッドサーチでは、与えられた「パラメータの値の候補」の全組み合わせを試行し、その中から性能が最も良いものを選びます。教師あり機械学習の例として lightGBM を使うと、こんな感じです。

%%time
from sklearn.model_selection import GridSearchCV

# LightGBM
import lightgbm as lgb

# グリッドサーチを行うためのパラメーター
parameters = [{
    'learning_rate':[0.1,0.2],
    'n_estimators':[20,100,200],
    'max_depth':[3,5,7,9],
    'min_child_weight':[0.5,1,2],
    'min_child_samples':[5,10,20],
    'subsample':[0.8],
    'colsample_bytree':[0.8],
    'verbose':[-1],
    'num_leaves':[80]
}]

#グリッドサーチ実行
classifier = GridSearchCV(lgb.LGBMClassifier(), parameters, cv=3, n_jobs=-1)
classifier.fit(X_train, y_train)
print("Accuracy score (train): ", classifier.score(X_train, y_train))
print("Accuracy score (test): ", classifier.score(X_test, y_test))
print(classifier.best_estimator_) # ベストのパラメーター
Accuracy score (train):  1.0
Accuracy score (test):  0.9517543859649122
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
               importance_type='split', learning_rate=0.1, max_depth=3,
               min_child_samples=20, min_child_weight=0.5, min_split_gain=0.0,
               n_estimators=100, n_jobs=-1, num_leaves=80, objective=None,
               random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=0.8, subsample_for_bin=200000, subsample_freq=0,
               verbose=-1)
CPU times: user 1.15 s, sys: 109 ms, total: 1.26 s
Wall time: 15.3 s

グリッドサーチの特徴は

  • あらかじめ与えられた値のみを使う。
  • 全組み合わせを試す。

で、期待できるところを重点的に探索、などはしません。

LightGBM + Optuna

では、このLightGBMをグリッドサーチではなくOptunaでチューニングしてみましょう。

import numpy as np
# 目的関数
def objective(trial):
    learning_rate = trial.suggest_loguniform('learning_rate', 0.1,0.2),
    n_estimators, = trial.suggest_int('n_estimators', 20, 200),
    max_depth, = trial.suggest_int('max_depth', 3, 9),
    min_child_weight = trial.suggest_loguniform('min_child_weight', 0.5, 2),
    min_child_samples, = trial.suggest_int('min_child_samples', 5, 20),
    classifier = lgb.LGBMClassifier(learning_rate=learning_rate, 
                                    n_estimators=n_estimators,
                                    max_depth=max_depth, 
                                    min_child_weight=min_child_weight,
                                    min_child_samples=min_child_samples,
                                    subsample=0.8, colsample_bytree=0.8,
                                    verbose=-1, num_leaves=80)
    classifier.fit(X_train, y_train)
    #return classifier.score(X_train, y_train) # 正答率(train) の最適化
    return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1) # 尤度の最適化

上の関数で、何を最適化すればよいかは選択の余地があると思います。用いた lightGBMは「回帰」ではなく「分類」をする学習器で、

classifier.score(X_train, y_train) を用いると、正答率(train) を最適化することになります。分類における正答率は、何個のデータ中何個が正しく分類されたかという数字ですので、連続値に見えて、実質、離散値に近い数字です。例えば、10個中8個が正しく分類されたとして、その分類が「余裕の正解」だろうと「ギリギリ正解」だろうと正答率(train)は変わりません。つまり、「ギリギリ正解」から「余裕の正解」に近づける力が働きにくいのです。

np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1) を用いると、これが回避できます。この中の y_train は、分類結果が0か1かという教師セット、classifier.predict_proba(X_train)[:, 1] は、予測が1であるという自身の強さ(確率に相当するもの)です。この2つの値の差の L1ノルム (L2ノルムでも構わないと思います)を最小化することで、「不正解」を「正解」に、「ギリギリ正解」から「余裕の正解」に近づける力を働きやすくします。

では、学習を開始しましょう。もし classifier.score(X_train, y_train) を用いた場合はその最大化を選びます。もし np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1) を用いた場合はその最小化を選びます。

#study = optuna.create_study(direction='maximize') # 最大化
study = optuna.create_study(direction='minimize') # 最小化

こうして100回試行して

study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:32:54,913][0m Finished trial#0 resulted in value: 1.5655193925176527. Current best value is 1.5655193925176527 with parameters: {'learning_rate': 0.11563458547060446, 'n_estimators': 155, 'max_depth': 7, 'min_child_weight': 0.7324812463494225, 'min_child_samples': 12}.[0m
[32m[I 2019-12-13 00:32:55,103][0m Finished trial#1 resulted in value: 1.3810988452320123. Current best value is 1.3810988452320123 with parameters: {'learning_rate': 0.15351688726053717, 'n_estimators': 83, 'max_depth': 6, 'min_child_weight': 0.5802652538400225, 'min_child_samples': 8}.[0m
[32m[I 2019-12-13 00:32:55,287][0m Finished trial#2 resulted in value: 3.519787362063691. Current best value is 1.3810988452320123 with parameters: {'learning_rate': 0.15351688726053717, 'n_estimators': 83, 'max_depth': 6, 'min_child_weight': 0.5802652538400225, 'min_child_samples': 8}.[0m
...(中略)...
[32m[I 2019-12-13 00:33:17,608][0m Finished trial#97 resulted in value: 1.0443245090791662. Current best value is 1.0230542364962214 with parameters: {'learning_rate': 0.11851649444429455, 'n_estimators': 176, 'max_depth': 9, 'min_child_weight': 0.50006741615294, 'min_child_samples': 8}.[0m
[32m[I 2019-12-13 00:33:17,871][0m Finished trial#98 resulted in value: 1.3997762969822483. Current best value is 1.0230542364962214 with parameters: {'learning_rate': 0.11851649444429455, 'n_estimators': 176, 'max_depth': 9, 'min_child_weight': 0.50006741615294, 'min_child_samples': 8}.[0m
[32m[I 2019-12-13 00:33:18,187][0m Finished trial#99 resulted in value: 1.1059309199723422. Current best value is 1.0230542364962214 with parameters: {'learning_rate': 0.11851649444429455, 'n_estimators': 176, 'max_depth': 9, 'min_child_weight': 0.50006741615294, 'min_child_samples': 8}.[0m

目的関数を最適化するパラメーターは

study.best_params
{'learning_rate': 0.11851649444429455,
 'max_depth': 9,
 'min_child_samples': 8,
 'min_child_weight': 0.50006741615294,
 'n_estimators': 176}

こんなキリの悪そうな値を、グリッドサーチでは、求められそうにないですね。

そしてそのときの最適値は

study.best_value
1.0230542364962214

得られた最適パラメータは、**study.best_params を用いて代入して、チューニング後の分類器を作れます。

classifier = lgb.LGBMClassifier(**study.best_params,
                                subsample=0.8, colsample_bytree=0.8,
                                verbose=-1, num_leaves=80)
classifier
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
               importance_type='split', learning_rate=0.11851649444429455,
               max_depth=9, min_child_samples=8,
               min_child_weight=0.50006741615294, min_split_gain=0.0,
               n_estimators=176, n_jobs=-1, num_leaves=80, objective=None,
               random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=0.8, subsample_for_bin=200000, subsample_freq=0,
               verbose=-1)

ベストな分類器で学習

classifier.fit(X_train, y_train)
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
               importance_type='split', learning_rate=0.11851649444429455,
               max_depth=9, min_child_samples=8,
               min_child_weight=0.50006741615294, min_split_gain=0.0,
               n_estimators=176, n_jobs=-1, num_leaves=80, objective=None,
               random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True,
               subsample=0.8, subsample_for_bin=200000, subsample_freq=0,
               verbose=-1)

そして予測

classifier.score(X_train, y_train)
1.0
classifier.score(X_test, y_test)
0.9473684210526315

目的関数の値の履歴

plt.plot([trial.value for trial in study.trials], label='value')
plt.grid()
plt.legend()
plt.show()

output_40_0.png

パラメータ探索の履歴

for key in study.trials[0].params.keys():
    plt.plot([trial.params[key] for trial in study.trials], label=key)
    plt.grid()
    plt.legend()
    plt.show()

output_41_0.png

output_41_1.png

output_41_2.png

output_41_3.png

output_41_4.png

という具合です。

scikit-learn/MLP + Optuna

同様に、scikit-learnの多層パーセプトロンをチューニングしてみましょう。まずはグリッドサーチから。

%%time
from sklearn.model_selection import GridSearchCV

# 多層パーセプトロン
from sklearn.neural_network import MLPClassifier
# グリッドサーチを行うためのパラメーター
parameters = [{'hidden_layer_sizes': [8, 16, 32, (8, 8), (8, 8, 8)], 
               'solver': ['adam'], 'activation': ['relu'],
              'learning_rate_init': [0.1, 0.01, 0.001]}]
#グリッドサーチ実行
classifier = GridSearchCV(MLPClassifier(max_iter=10000, early_stopping=True), 
                          parameters, cv=3, n_jobs=-1)
classifier.fit(X_train, y_train)
print("Accuracy score (train): ", classifier.score(X_train, y_train))
print("Accuracy score (test): ", classifier.score(X_test, y_test))
print(classifier.best_estimator_) # ベストのパラメーターを持つ分類器
Accuracy score (train):  0.9090909090909091
Accuracy score (test):  0.8728070175438597
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=32, learning_rate='constant',
              learning_rate_init=0.1, max_iter=10000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=None, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)
CPU times: user 166 ms, sys: 30.3 ms, total: 196 ms
Wall time: 3.07 s

多層パーセプトロンでは、

  • 隠れ層を何層にするか
  • それぞれの層のサイズをどのくらいにするか

というファクターがあるので、ちょっと大変です。

隠れ層を1層に固定

# 目的関数
def objective(trial):
    hidden_layer_sizes, = trial.suggest_int('hidden_layer_sizes', 8, 100),
    learning_rate_init, = trial.suggest_loguniform('learning_rate_init', 0.001, 0.1),
    classifier = MLPClassifier(max_iter=10000, early_stopping=True,
                                    hidden_layer_sizes=hidden_layer_sizes,
                                    learning_rate_init=learning_rate_init, 
                                    solver='adam', activation='relu')
    classifier.fit(X_train, y_train)
    #return classifier.score(X_train, y_train)
    #return classifier.score(X_test, y_test)
    return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1)
#study = optuna.create_study(direction='maximize')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:33:23,314][0m Finished trial#0 resulted in value: 33.67375867333378. Current best value is 33.67375867333378 with parameters: {'hidden_layer_sizes': 73, 'learning_rate_init': 0.004548472515805296}.[0m
[32m[I 2019-12-13 00:33:23,538][0m Finished trial#1 resulted in value: 35.17385235930611. Current best value is 33.67375867333378 with parameters: {'hidden_layer_sizes': 73, 'learning_rate_init': 0.004548472515805296}.[0m
[32m[I 2019-12-13 00:33:23,716][0m Finished trial#2 resulted in value: 52.815452458627675. Current best value is 33.67375867333378 with parameters: {'hidden_layer_sizes': 73, 'learning_rate_init': 0.004548472515805296}.[0m
...(中略)...
[32m[I 2019-12-13 00:33:47,631][0m Finished trial#97 resulted in value: 150.15953891394736. Current best value is 23.844866313445344 with parameters: {'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}.[0m
[32m[I 2019-12-13 00:33:47,894][0m Finished trial#98 resulted in value: 32.56506872305802. Current best value is 23.844866313445344 with parameters: {'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}.[0m
[32m[I 2019-12-13 00:33:48,172][0m Finished trial#99 resulted in value: 38.57363524502563. Current best value is 23.844866313445344 with parameters: {'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}.[0m
study.best_params
{'hidden_layer_sizes': 79, 'learning_rate_init': 0.010242027297662661}
study.best_value
23.844866313445344
classifier = MLPClassifier(**study.best_params)
classifier.fit(X_train, y_train)
classifier.score(X_train, y_train), classifier.score(X_test, y_test)
(0.9472140762463344, 0.9122807017543859)
plt.plot([trial.value for trial in study.trials], label='score')
plt.grid()
plt.legend()
plt.show()

output_50_0.png

for key in study.trials[0].params.keys():
    plt.plot([trial.params[key] for trial in study.trials], label=key)
    plt.grid()
    plt.legend()
    plt.show()

output_51_0.png

output_51_1.png

隠れ層を2層に固定

# 目的関数
def objective(trial):
    h1, = trial.suggest_int('h1', 8, 100),
    h2, = trial.suggest_int('h2', 8, 100),
    learning_rate_init, = trial.suggest_loguniform('learning_rate_init', 0.001, 0.1),
    classifier = MLPClassifier(max_iter=10000, early_stopping=True,
                                    hidden_layer_sizes=(h1, h2),
                                    learning_rate_init=learning_rate_init, 
                                    solver='adam', activation='relu')
    classifier.fit(X_train, y_train)
    #return classifier.score(X_train, y_train)
    #return classifier.score(X_test, y_test)
    return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1)
#study = optuna.create_study(direction='maximize')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:33:49,555][0m Finished trial#0 resulted in value: 44.26353774856942. Current best value is 44.26353774856942 with parameters: {'h1': 15, 'h2': 99, 'learning_rate_init': 0.003018305556292618}.[0m
[32m[I 2019-12-13 00:33:49,851][0m Finished trial#1 resulted in value: 29.450960862380153. Current best value is 29.450960862380153 with parameters: {'h1': 81, 'h2': 79, 'learning_rate_init': 0.01344672244443261}.[0m
[32m[I 2019-12-13 00:33:50,073][0m Finished trial#2 resulted in value: 38.96850500173973. Current best value is 29.450960862380153 with parameters: {'h1': 81, 'h2': 79, 'learning_rate_init': 0.01344672244443261}.[0m
...(中略)...    [32m[I 2019-12-13 00:34:19,151][0m Finished trial#97 resulted in value: 34.73946747640069. Current best value is 22.729638264213385 with parameters: {'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}.[0m
[32m[I 2019-12-13 00:34:19,472][0m Finished trial#98 resulted in value: 38.708695477563566. Current best value is 22.729638264213385 with parameters: {'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}.[0m
[32m[I 2019-12-13 00:34:19,801][0m Finished trial#99 resulted in value: 42.20352641425415. Current best value is 22.729638264213385 with parameters: {'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}.[0m
study.best_params
{'h1': 73, 'h2': 91, 'learning_rate_init': 0.005367313373989512}
study.best_value
22.729638264213385

ベストのパラメータを **study.best_params で使おうと思ったら次のようにエラーが出ます。簡単な解決方法は今のところわからないので、得られたパラメータを愚直に代入するしか思いつきません。

classifier = MLPClassifier(**study.best_params)
classifier.fit(X_train, y_train)
classifier.score(X_train, y_train), classifier.score(X_test, y_test)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-49-ee91971a40bc> in <module>()
----> 1 classifier = MLPClassifier(**study.best_params)
      2 classifier.fit(X_train, y_train)
      3 classifier.score(X_train, y_train), classifier.score(X_test, y_test)


TypeError: __init__() got an unexpected keyword argument 'h1'

各種履歴

plt.plot([trial.value for trial in study.trials], label='value')
plt.grid()
plt.legend()
plt.show()

output_58_0.png

for key in study.trials[0].params.keys():
    plt.plot([trial.params[key] for trial in study.trials], label=key)
    plt.grid()
    plt.legend()
    plt.show()

output_59_0.png

output_59_1.png

output_59_2.png

層の深さを固定しない

# 目的関数
def objective(trial):
    h1, = trial.suggest_int('h1', 8, 100),
    h2, = trial.suggest_int('h2', 8, 100),
    h3, = trial.suggest_int('h3', 8, 100),
    h4, = trial.suggest_int('h4', 8, 100),
    h5, = trial.suggest_int('h5', 8, 100),
    hidden_layer_sizes = []
    n = trial.suggest_int('n', 1, 5)
    for h in [h1, h2, h3, h4, h5]:
        hidden_layer_sizes.append(h)
        if len(hidden_layer_sizes) == n:
            break
    learning_rate_init, = trial.suggest_loguniform('learning_rate_init', 0.001, 0.1),
    classifier = MLPClassifier(max_iter=10000, early_stopping=True,
                                    hidden_layer_sizes=hidden_layer_sizes,
                                    learning_rate_init=learning_rate_init, 
                                    solver='adam', activation='relu')
    classifier.fit(X_train, y_train)
    #return classifier.score(X_train, y_train)
    #return classifier.score(X_test, y_test)
    return np.linalg.norm(y_train - classifier.predict_proba(X_train)[:, 1], ord=1)
#study = optuna.create_study(direction='maximize')
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
[32m[I 2019-12-13 00:34:37,028][0m Finished trial#0 resulted in value: 117.6950339936551. Current best value is 117.6950339936551 with parameters: {'h1': 44, 'h2': 90, 'h3': 75, 'h4': 51, 'h5': 87, 'n': 3, 'learning_rate_init': 0.043829528929494495}.[0m
[32m[I 2019-12-13 00:34:37,247][0m Finished trial#1 resulted in value: 107.63845860162616. Current best value is 107.63845860162616 with parameters: {'h1': 16, 'h2': 51, 'h3': 13, 'h4': 36, 'h5': 27, 'n': 3, 'learning_rate_init': 0.04986625228277607}.[0m
[32m[I 2019-12-13 00:34:37,513][0m Finished trial#2 resulted in value: 198.86827020586986. Current best value is 107.63845860162616 with parameters: {'h1': 16, 'h2': 51, 'h3': 13, 'h4': 36, 'h5': 27, 'n': 3, 'learning_rate_init': 0.04986625228277607}.[0m
...(中略)...
[32m[I 2019-12-13 00:35:10,424][0m Finished trial#97 resulted in value: 31.485260318520005. Current best value is 23.024826770529504 with parameters: {'h1': 62, 'h2': 60, 'h3': 58, 'h4': 77, 'h5': 27, 'n': 1, 'learning_rate_init': 0.011342241271350882}.[0m
[32m[I 2019-12-13 00:35:10,801][0m Finished trial#98 resulted in value: 27.752591077771235. Current best value is 23.024826770529504 with parameters: {'h1': 62, 'h2': 60, 'h3': 58, 'h4': 77, 'h5': 27, 'n': 1, 'learning_rate_init': 0.011342241271350882}.[0m
[32m[I 2019-12-13 00:35:11,199][0m Finished trial#99 resulted in value: 81.29419572506973. Current best value is 23.024826770529504 with parameters: {'h1': 62, 'h2': 60, 'h3': 58, 'h4': 77, 'h5': 27, 'n': 1, 'learning_rate_init': 0.011342241271350882}.[0m
study.best_params
{'h1': 62,
 'h2': 60,
 'h3': 58,
 'h4': 77,
 'h5': 27,
 'learning_rate_init': 0.011342241271350882,
 'n': 1}
study.best_value
23.024826770529504
plt.plot([trial.value for trial in study.trials], label='score')
plt.grid()
plt.legend()
plt.show()

output_65_0.png

for key in study.trials[0].params.keys():
    plt.plot([trial.params[key] for trial in study.trials], label=key)
    plt.grid()
    plt.legend()
    plt.show()

output_66_0.png

output_66_1.png

output_66_2.png

output_66_3.png

output_66_4.png

output_66_5.png

output_66_6.png

PyTorch + Optuna

上記と同じように、多層パーセプトロンの最適化を PyTorch + Optuna で試してみました。

import torch
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).float()
y_test = torch.from_numpy(y_test).float()

train = TensorDataset(X_train, y_train)

train_loader = DataLoader(train, batch_size=10, shuffle=True)
import torch
class MLPC(torch.nn.Module):
    def __init__(self, n_input, n_hidden1, n_output):
        super(MLPC, self).__init__()
        self.l1 = torch.nn.Linear(n_input, n_hidden1)
        self.l2 = torch.nn.Linear(n_hidden1, n_output)

    def forward(self, x):
        h1 = self.l1(x)
        h2 = torch.sigmoid(h1)
        h3 = self.l2(h2)
        h4 = torch.sigmoid(h3)
        return h4
    
    def score(self, x, y, threshold=0.5):
        accum = 0
        for y_pred, y1 in zip(self.forward(x), y):
            if y1 == 1:
                if y_pred >= threshold:
                    accum += 1
            else:
                if y_pred < threshold:
                    accum += 1
        return accum / len(y)
# 目的関数
from torch.autograd import Variable
def objective(trial):
    n_h1, = trial.suggest_int('n_hidden1', 1, 100),
    lr, = trial.suggest_loguniform('lr', 0.001, 0.1),
    model = MLPC(len(train[0][0]), n_h1, 1)
    criterion = torch.nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)

    #loss_history = []
    n_epoch = 2000
    for epoch in range(n_epoch):
        total_loss = 0
        for x, y in train_loader:
            x = Variable(x)
            y = Variable(y)
            optimizer.zero_grad()
            y_pred = model(x)
            loss = criterion(y_pred, y)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        #loss_history.append(total_loss)
        #if (epoch +1) % (n_epoch / 10) == 0:
        #    print(epoch + 1, total_loss)
    score_train_history.append(model.score(X_train, y_train))
    score_test_history.append(model.score(X_test, y_test))
    return total_loss # model.score(X_test, y_test) にすると学習が進まない?

失敗編

n_trials=100
score_train_history = []
score_test_history = []
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=n_trials)
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:

Using a target size (torch.Size([10])) that is different to the input size (torch.Size([10, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:

Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

[32m[I 2019-12-13 00:36:09,957][0m Finished trial#0 resulted in value: 8.354197099804878. Current best value is 8.354197099804878 with parameters: {'n_hidden1': 50, 'lr': 0.008643921209550006}.[0m
[32m[I 2019-12-13 00:37:02,256][0m Finished trial#1 resulted in value: 8.542565807700157. Current best value is 8.354197099804878 with parameters: {'n_hidden1': 50, 'lr': 0.008643921209550006}.[0m
[32m[I 2019-12-13 00:37:54,087][0m Finished trial#2 resulted in value: 8.721126735210419. Current best value is 8.354197099804878 with parameters: {'n_hidden1': 50, 'lr': 0.008643921209550006}.[0m
...(中略)...
[32m[I 2019-12-13 01:59:43,405][0m Finished trial#97 resulted in value: 8.414046227931976. Current best value is 8.206612035632133 with parameters: {'n_hidden1': 82, 'lr': 0.0010109929013465883}.[0m
[32m[I 2019-12-13 02:00:36,203][0m Finished trial#98 resulted in value: 8.469094559550285. Current best value is 8.206612035632133 with parameters: {'n_hidden1': 82, 'lr': 0.0010109929013465883}.[0m
[32m[I 2019-12-13 02:01:28,698][0m Finished trial#99 resulted in value: 8.296677514910698. Current best value is 8.206612035632133 with parameters: {'n_hidden1': 82, 'lr': 0.0010109929013465883}.[0m

まずは失敗編です。なにか Warning が出ましたね。気にせず続けてみましたが、警告通り、精度は上がりませんでした。

study.best_params
{'lr': 0.0010109929013465883, 'n_hidden1': 82}
study.best_value
8.206612035632133
plt.plot([trial.value for trial in study.trials], label='loss')
plt.grid()
plt.legend()
plt.show()

output_74_0.png

plt.plot(score_train_history, label='score (train)')
plt.plot(score_test_history, label='score (test)')
plt.grid()
plt.legend()
plt.show()

output_75_0.png

for key in study.trials[0].params.keys():
    plt.plot([trial.params[key] for trial in study.trials], label=key)
    plt.grid()
    plt.legend()
    plt.show()

output_76_0.png

output_76_1.png

成功編

上の「失敗編」は、何がまずかったのでしょうか? 警告メッセージ

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:

Using a target size (torch.Size([10])) that is different to the input size (torch.Size([10, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning:

Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

の意味は、目的変数の行列の形が良くないということです。今回は、

# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
from sklearn.datasets import load_breast_cancer
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

のようにして説明変数と目的変数を作成しましたが、このときの y を reshape する必要があります。

y = y.reshape((len(y), 1))

失敗の原因はこれだけでした。それ以外は、さっきと全く同じコードで精度が上がりました。失敗の時と比べてみましょう。

# 訓練データとテストデータに分割するメソッドのインポート
from sklearn.model_selection import train_test_split 
# 訓練データ・テストデータへ6:4の比でランダムに分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4) 
import torch
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).float()
y_test = torch.from_numpy(y_test).float()

train = TensorDataset(X_train, y_train)

train_loader = DataLoader(train, batch_size=10, shuffle=True)
import torch
class MLPC(torch.nn.Module):
    def __init__(self, n_input, n_hidden1, n_output):
        super(MLPC, self).__init__()
        self.l1 = torch.nn.Linear(n_input, n_hidden1)
        self.l2 = torch.nn.Linear(n_hidden1, n_output)

    def forward(self, x):
        h1 = self.l1(x)
        h2 = torch.sigmoid(h1)
        h3 = self.l2(h2)
        h4 = torch.sigmoid(h3)
        return h4
    
    def score(self, x, y, threshold=0.5):
        accum = 0
        for y_pred, y1 in zip(self.forward(x), y):
            if y1 == 1:
                if y_pred >= threshold:
                    accum += 1
            else:
                if y_pred < threshold:
                    accum += 1
        return accum / len(y)
# 目的関数
from torch.autograd import Variable
def objective(trial):
    n_h1, = trial.suggest_int('n_hidden1', 1, 100),
    lr, = trial.suggest_loguniform('lr', 0.001, 0.1),
    model = MLPC(len(train[0][0]), n_h1, 1)
    criterion = torch.nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)

    #loss_history = []
    n_epoch = 2000
    for epoch in range(n_epoch):
        total_loss = 0
        for x, y in train_loader:
            #if x.shape[0] == 1:
            #    continue
            #print(x.shape, y.shape)
            x = Variable(x)
            y = Variable(y)
            optimizer.zero_grad()
            y_pred = model(x)
            loss = criterion(y_pred, y)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        #loss_history.append(total_loss)
        #if (epoch +1) % (n_epoch / 10) == 0:
        #    print(epoch + 1, total_loss)
    score_train_history.append(model.score(X_train, y_train))
    score_test_history.append(model.score(X_test, y_test))
    return total_loss # model.score(X_test, y_test) にすると学習が進まない?
n_trials=100
score_train_history = []
score_test_history = []
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=n_trials)
[32m[I 2019-12-13 07:58:42,273][0m Finished trial#0 resulted in value: 7.991558387875557. Current best value is 7.991558387875557 with parameters: {'n_hidden1': 100, 'lr': 0.001719688534454947}.[0m
[32m[I 2019-12-13 07:59:29,221][0m Finished trial#1 resulted in value: 8.133784644305706. Current best value is 7.991558387875557 with parameters: {'n_hidden1': 100, 'lr': 0.001719688534454947}.[0m
[32m[I 2019-12-13 08:00:16,849][0m Finished trial#2 resulted in value: 8.075047567486763. Current best value is 7.991558387875557 with parameters: {'n_hidden1': 100, 'lr': 0.001719688534454947}.[0m
...(中略)...
[32m[I 2019-12-13 09:14:47,236][0m Finished trial#97 resulted in value: 8.02999284863472. Current best value is 2.8610200360417366 with parameters: {'n_hidden1': 38, 'lr': 0.0010151912634053866}.[0m
[32m[I 2019-12-13 09:15:34,106][0m Finished trial#98 resulted in value: 5.849344417452812. Current best value is 2.8610200360417366 with parameters: {'n_hidden1': 38, 'lr': 0.0010151912634053866}.[0m
[32m[I 2019-12-13 09:16:20,332][0m Finished trial#99 resulted in value: 8.052950218319893. Current best value is 2.8610200360417366 with parameters: {'n_hidden1': 38, 'lr': 0.0010151912634053866}.[0m
study.best_params
{'lr': 0.0010151912634053866, 'n_hidden1': 38}
study.best_value
2.8610200360417366
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([trial.value for trial in study.trials], label='loss')
plt.grid()
plt.legend()
plt.show()

output_13_0.png

plt.plot(score_train_history, label='score (train)')
plt.plot(score_test_history, label='score (test)')
plt.grid()
plt.legend()
plt.show()

output_14_0.png

for key in study.trials[0].params.keys():
    plt.plot([trial.params[key] for trial in study.trials], label=key)
    plt.grid()
    plt.legend()
    plt.show()

output_15_0.png

output_15_1.png

60
41
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
60
41

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?