More than 1 year has passed since last update.

［Python / 機械学習］ワイン 1,600 本分のデータを学習し、未知のワインの味を 10 段階評価する

Last updated at 2023-07-16Posted at 2023-02-19

この記事で書くこと

機械学習による分析をひととおり体験する
2018年に DeepLearning に関する記事を記載した。今回は主に pandas を利用した機械学習を簡単に体験できるようにした

この記事で書かないこと

欠損値の処理
訓練用データとテスト用データの分割

開発環境

項目	内容	備考
PC	MacBook Air （2020 M1）
言語	Python	ver 3.11.2
エディタ	Visual Studio Code

シナリオ

ワイン 1,600 本分のデータを学習し、未知のワインの味を 10 段階評価する

ディレクトリ・ファイルの配置

wine
├─ main.py
└─ winequality-red.csv

利用した CSV

列とそれぞれの内容

列名	内容	備考
fixed acidity	酒石酸濃度
volatile acidity	酢酸濃度
citric acid	クエン酸濃度
residual sugar	残糖濃度
chlorides	塩化ナトリウム濃度
free sulfur dioxide	遊離SO2（二酸化硫黄）濃度
total sulfur dioxide	総SO2（二酸化硫黄）濃度
density	密度
pH	水素イオン濃度	1：酸性 7：中性 14：アルカリ性
sulphates	硫化カリウム濃度
alcohol	アルコール度数
quality	評価	10段階評価で評価したもの

手順

CSV をダウンロード
main.py を編集
main.py に書いたコードを実行

コード

main.py

import pandas as pd
from sklearn import tree
import pickle

# CSV を読み込み
df = pd.read_csv('./winequality-red.csv',sep=";")

# データフレームを出力
print(df)

# 列名を表示
print(list(df.columns.values))

# 説明変数
xcol = ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol']
x = df[xcol]

# 目的変数
t = df['quality']

# モデルの準備（決定木モデルを利用）
model = tree.DecisionTreeClassifier(random_state=0)

# 学習を実行
model.fit(x,t)

# サンプルのデータ
sample = [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 7.0, 0.0, 0.0]]

# サンプルデータの予測
print(model.predict(sample))

# モデルの評価
model.score(x,y)

実行結果

      fixed acidity  volatile acidity  ...  alcohol  quality
0               7.4             0.700  ...      9.4        5
1               7.8             0.880  ...      9.8        5
2               7.8             0.760  ...      9.8        5
3              11.2             0.280  ...      9.8        6
4               7.4             0.700  ...      9.4        5
...             ...               ...  ...      ...      ...
1594            6.2             0.600  ...     10.5        5
1595            5.9             0.550  ...     11.2        6
1596            6.3             0.510  ...     11.0        6
1597            5.9             0.645  ...     10.2        5
1598            6.0             0.310  ...     11.0        6

[1599 rows x 12 columns]
['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol', 'quality']
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/base.py:420: UserWarning: X does not have valid feature names, but DecisionTreeClassifier was fitted with feature names
  warnings.warn(
[4]
1.0

実行結果

今回のサンプルデータに対し、このモデルは [4] と評価した。

モデルの保存

モデルの正解率が 1.0 であるため、このモデルを保存する。
保存する処理は次の通り

# モデルの保存
with open('winequality-red.pkl','wb') as f:
    pickle.dump(model, f)

備考

PC の環境を汚さずに試してみたい場合、Docker を利用して実行してみてはいかがでしょうか（下記記事参照）

参考

Qiita

書籍

スッキリわかるPythonによる機械学習入門 (スッキリわかる入門シリーズ)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up