sckit-learnのアンサンブル学習「VotingClassifier」に触ってみる

Posted at 2019-02-06

はじめに

最近、機械学習入門ということでsckit-learnを使い始めました。
しかし....

「ふむふむ、回帰問題をRegressorで分類問題をClassifierでとけばいいのか」

「タイタニック問題は分類だからClassifierを使えばいいか」

「使えるClassifier系のアルゴリズムは....」

「??? どれを使えばいいかわからぬ」

VotingClassifierとは

The idea behind the VotingClassifier is to combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities (soft vote) to predict the class labels. Such a classifier can be useful for a set of equally well performing model in order to balance out their individual weaknesses.
（VotingClassifierの背景にある考え方は、コンセプトの異なる機械学習分類器を組み合わせ、
多数決や予測の平均(弱い投票)を使用し、クラスラベルを予測することです。
そのような分類器は個々の弱点を相殺するため、モデルの生成に有効である場合もあります。)
'1.11.5. Voting Classifier'. scikit-learn.org (参照 2019-2-6)
訳は引用者による

「予測を重ねてそれぞれの弱点を補完する・・・」
「それって最強のアルゴリズムじゃん！！！」
　※ 個人の感想です。

タイタニックでやってみる

タイタニックとはkaggleで公開されている、機械学習の基本を学ぶため用意されたコンペです。

ソース

sklearn_test.ipynb

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# 〜〜〜データ処理部は省略〜〜〜
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

# votingに使用する分類器の呼び出し
# パラメータは全てデフォルト
ada_clf = AdaBoostClassifier()
bag_clf = BaggingClassifier()
et_clf = ExtraTreesClassifier()
gb_clf = GradientBoostingClassifier()
rf_clf = RandomForestClassifier()

sklearn_test.ipynb

vote_clf = VotingClassifier(estimators=[('rf', rf_clf),
                                        ('gb',gb_clf),
                                        ('et',et_clf),
                                        ('bag',bag_clf),
                                        ('ada',ada_clf)
                                        ])
vote_clf = vote_clf.fit(X_train, y_train)

ソースコード全文

「よっしゃ！スコアが0.765まで上がったぜ！」

オチ

ソース中で呼んでいる「BaggingClassifier」単体でのスコアは0.770

参考サイト

scikit-learn
Titanic: Machine Learning from Disaster

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up