More than 3 years have passed since last update.

Pythonによる AI・機械学習について学んだ内容(4)

Last updated at 2020-01-05Posted at 2020-01-05

はじめに

この本を使って勉強しています
Pythonによる AI・機械学習・深層学習アプリのつくり方

2-2 アヤメの分類

機械学習でよくあるアヤメの分類をしてみる。
ダウンロードは、次のURLから CSV ファイルを取得する。
https://github.com/pandas-dev/pandas/blob/master/pandas/tests/data/iris.csv
「Raw」ボタンを押してブラウザの保存機能を使って保存する。
次のような構成になっている。

列	カラム名	カラムの意味	値の例
1	SepalLength	がく片の長さ	5.1
2	SepalWidth	がく片の幅	3.5
3	PetalLength	花びらの長さ	1.4
4	PetalWidth	花びらの幅	0.2
5	Name	アヤメの品種	Iris-setosa

アヤメの品種
Iris-Setosa
Iris-Versicolor
Iris-Virginica

サイトから直接ダウンロードする方法

ブラウザで保存するのではなく Python で直接ダウンロードすることもできる。

import urllib.request as req
import pandas as pd

# ファイルをダウンロードする
url = "https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv" # 先ほどのURLではない
savefile = "iris.csv"
req.urlretrieve(url, savefile)

# ダウンロードしたファイルの内容を表示する
csv = pd.read_csv(savefile, encoding="utf-8")
csv

150行のデータが表示される。

ゴールを決定する

がく片や花びらの長さと幅から、アヤメの品種を分類することをゴールとする。
次の順番で機械学習プログラムを実装する。

アヤメデータとしてダウンロードした iris.csv を読み込む
アヤメデータをがく片や花びらの長さと幅の情報とアヤメの品種情報（ラベル部分）に分離する
全データのうち、80%を学習用データに、残り20%をテストデータに分離する
学習データを使って学習させ、テスト用データを与えた場合に正しく分類しているか評価する

　プログラムを実装する

iris.py

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# アヤメデータの読み込み 
iris_data = pd.read_csv("iris.csv", encoding="utf-8")

# アヤメデータをラベルと入力データに分離する
y = iris_data.loc[:, "Name"]
x = iris_data.loc[:,["SepalLength", "SepalWidth", "PetalLength", "PetalWidth"]]

# 学習用とテスト用に分離する
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, train_size = 0.8, shuffle = True)

# 学習する
clf = SVC()
clf.fit(x_train, y_train)

# 評価する
y_pred = clf.predict(x_test)
print("正解率：", accuracy_score(y_test, y_pred))

正解率： 0.9333333333333333
/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

なにやら警告が出る。
FutureWarning があるのは、将来 SVCのgamma は、'auto' から 'scale'　になるよと言っている。

clf = SVC(gamma = "scale")

と書けば警告が消える。
さらに、

# 学習用とテスト用に分離する
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, train_size = 0.8, shuffle = True)

の部分だが、今後は

scikit-learn でトレーニングデータとテストデータを作成する

を参考に　stratify　オプションを記述したほうがよいだろう。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up