More than 5 years have passed since last update.

The Iris Dataset いろいろ

Last updated at 2021-03-01Posted at 2021-02-27

機械学習で Python の練習をしたいなぁと思っていろいろ触りました。

環境

Colaboratory

Colaboratory とは
Colaboratory（略称: Colab）は、ブラウザから Python を記述、実行できるサービスです。次の特長を備えています。
・環境構築が不要
・GPU への無料アクセス
・簡単に共有

以前、機械学習の勉強をしたときは Jupyter Notebook を使ったので、今回は Colaboratory を使ってみることにしました。

教材

The Iris Dataset
3種類のアイリス（Setosa、Versicolour、Virginica）について、花びらやがく片の長さで分類していきます。

そのまま実行してみる

plot_iris_dataset.py

print(__doc__)


# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

plt.figure(2, figsize=(8, 6))
plt.clf()

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1,
            edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

# To getter a better understanding of interaction of the dimensions
# plot the first three PCA dimensions
fig = plt.figure(1, figsize=(8, 6))
ax = Axes3D(fig, elev=-150, azim=110)
X_reduced = PCA(n_components=3).fit_transform(iris.data)
ax.scatter(X_reduced[:, 0], X_reduced[:, 1], X_reduced[:, 2], c=y,
           cmap=plt.cm.Set1, edgecolor='k', s=40)
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.w_xaxis.set_ticklabels([])
ax.set_ylabel("2nd eigenvector")
ax.w_yaxis.set_ticklabels([])
ax.set_zlabel("3rd eigenvector")
ax.w_zaxis.set_ticklabels([])

plt.show()

今回は上部分だけ見てみます。
(3Dのやつは、なんかいじっても見づらそうなのでいったん見ない)

plot_iris_dataset.py

print(__doc__)


# Code source: Gaël Varoquaux
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import datasets
from sklearn.decomposition import PCA

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target

x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

plt.figure(2, figsize=(8, 6))
plt.clf()

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1,
            edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

plt.show()

データセットを変えてみる

まずは中身の確認

import seaborn as sns
iris = sns.load_dataset("iris") 
# ちなみにこのirisはpandasのdataframeです。
iris.head(20)

X = iris.data[:, :2] の部分を X = iris.data[:, 2:4]に変える。
ラベルも Sepal から Petal へ変える。

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 2:4]  # we only take the first two features.
y = iris.target

plt.xlabel('Petal length')
plt.ylabel('Petal width')

実行すると、花弁の長さ・幅の分布図が出力されました。

iris.data[:, 2:4]

NumPy 入門 #多次元配列の要素を選択する

NumPy 独自の記法

行列があった場合

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

1列目の値 [1, 4, 7] をすべて取得したい場合は

x[:, 0]

2列目から3列目の値をすべて取得したい場合は

x[:, 1:2]

グラフの表示を変えてみる

matplotlib.pyplot.scatter — Matplotlib 3.3.4.post2472+g1ec609a3f documentation

# Plot the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1, marker="*",
            edgecolor='')

お星さまになった

感想

Colaboratory が思いのほか良かったです。
とりあえず実行してグラフィックで結果を見たい場合に便利だなぁと思いました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up