はじめに
ニューラルネットワークでワインの品質判定をやってみたので、紹介します。
プログラム
ワインの特徴量と品質が記されたデータセットをURLから持ってきて、
特徴量からワインの品質を求めるようなプログラムになってます。
# 必要なライブラリのインポート
import numpy as np # 数値計算を効率的に行うためのライブラリ
import pandas as pd # データフレームを扱うためのライブラリ
from tensorflow import keras # ニューラルネットワークを構築・訓練するためのライブラリ
from tensorflow.keras import layers # ニューラルネットワークの層を作成するためのモジュール
from sklearn.model_selection import train_test_split # データを訓練用とテスト用に分割するための関数
from sklearn.preprocessing import StandardScaler # データを標準化するためのクラス
from sklearn.metrics import accuracy_score # 正解率を計算するための関数
# データセットの読み込み
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', sep=';')
# データの前処理
df = df.replace('?', np.nan).dropna() # '?'を欠損値として扱い、欠損値を含む行を削除
for column in df.columns:
df[column] = df[column].astype(float) # 全ての列のデータタイプをfloatに変換
# 品質をカテゴリ変数に変換
df['quality'] = pd.Categorical(df['quality']) # 'quality'列をカテゴリ変数に変換
# 特徴量と目的変数の指定
X = df.drop('quality', axis=1) # 'quality'列を除いたデータを特徴量として使用
y = df['quality'].cat.codes # 'quality'列のカテゴリを数値に変換して目的変数として使用
# データの標準化
scaler = StandardScaler() # StandardScalerのインスタンスを作成
X_scaled = scaler.fit_transform(X) # 特徴量データを標準化
# 訓練データとテストデータに分割
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=0) # データを訓練用とテスト用に分割
# モデルの構築
model = keras.Sequential([
layers.Dense(512, activation='relu', input_shape=[X_train.shape[1]]), # 入力層として512ユニットの全結合層
layers.Dense(256, activation='relu'), # 256ユニットの全結合層
layers.Dense(128, activation='relu'), # 128ユニットの全結合層
layers.Dense(64, activation='relu'), # 64ユニットの全結合層
layers.Dense(32, activation='relu'), # 32ユニットの全結合層
layers.Dense(y.nunique(), activation='softmax') # 出力層として、品質の種類数に等しいユニット数の全結合層
])
# モデルのコンパイル
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # モデルをコンパイル
# モデルの訓練
model.fit(X_train, y_train, validation_split=0.2, epochs=50, batch_size=10) # モデルを訓練
# モデルの評価
loss, accuracy = model.evaluate(X_test, y_test) # テストデータでモデルを評価
print(f'テストセットの損失: {loss:.4f}') # テストセットの損失を表示
print(f'テストセットの正解率: {accuracy:.4f}') # テストセットの正解率を表示
# 訓練データとテストデータに対する予測
train_predictions = model.predict(X_train) # 訓練データに対する予測
test_predictions = model.predict(X_test) # テストデータに対する予測
# 正解率の計算
train_accuracy = accuracy_score(y_train, np.argmax(train_predictions, axis=1)) # 訓練データの正解率を計算
test_accuracy = accuracy_score(y_test, np.argmax(test_predictions, axis=1)) # テストデータの正解率を計算
print(f'正解率(train): {train_accuracy:.3f}') # 訓練データの正解率を表示
print(f'正解率(test): {test_accuracy:.3f}') # テストデータの正解率を表示
実行結果
実行結果は以下のような感じでした。
正解率が62.8%なので、非常にビミョーな感じ。。。
C:\Python\wine>python nn7.py
2024-06-01 20:38:32.204349: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Epoch 1/50
103/103 [==============================] - 2s 7ms/step - loss: 1.1503 - accuracy: 0.5103 - val_loss: 0.9736 - val_accuracy: 0.5664
Epoch 2/50
103/103 [==============================] - 0s 3ms/step - loss: 1.0040 - accuracy: 0.5718 - val_loss: 1.1034 - val_accuracy: 0.5391
Epoch 3/50
103/103 [==============================] - 0s 3ms/step - loss: 0.9654 - accuracy: 0.5836 - val_loss: 0.9643 - val_accuracy: 0.5430
Epoch 4/50
103/103 [==============================] - 0s 3ms/step - loss: 0.9268 - accuracy: 0.6129 - val_loss: 0.9858 - val_accuracy: 0.5938
Epoch 5/50
103/103 [==============================] - 0s 3ms/step - loss: 0.8943 - accuracy: 0.6266 - val_loss: 0.9449 - val_accuracy: 0.5938
Epoch 6/50
103/103 [==============================] - 0s 3ms/step - loss: 0.8839 - accuracy: 0.6305 - val_loss: 0.9369 - val_accuracy: 0.5625
Epoch 7/50
103/103 [==============================] - 0s 4ms/step - loss: 0.8650 - accuracy: 0.6383 - val_loss: 0.9243 - val_accuracy: 0.5898
Epoch 8/50
103/103 [==============================] - 0s 3ms/step - loss: 0.8407 - accuracy: 0.6491 - val_loss: 0.8854 - val_accuracy: 0.6133
Epoch 9/50
103/103 [==============================] - 0s 3ms/step - loss: 0.8131 - accuracy: 0.6647 - val_loss: 0.9356 - val_accuracy: 0.5859
Epoch 10/50
103/103 [==============================] - 0s 3ms/step - loss: 0.8038 - accuracy: 0.6569 - val_loss: 1.0093 - val_accuracy: 0.5625
Epoch 11/50
103/103 [==============================] - 0s 3ms/step - loss: 0.7739 - accuracy: 0.6735 - val_loss: 0.9509 - val_accuracy: 0.5820
Epoch 12/50
103/103 [==============================] - 0s 3ms/step - loss: 0.7499 - accuracy: 0.6745 - val_loss: 0.9690 - val_accuracy: 0.5781
Epoch 13/50
103/103 [==============================] - 0s 3ms/step - loss: 0.7178 - accuracy: 0.6940 - val_loss: 1.0237 - val_accuracy: 0.6172
Epoch 14/50
103/103 [==============================] - 0s 3ms/step - loss: 0.6940 - accuracy: 0.7097 - val_loss: 1.0317 - val_accuracy: 0.5898
Epoch 15/50
103/103 [==============================] - 0s 3ms/step - loss: 0.6759 - accuracy: 0.7077 - val_loss: 1.0377 - val_accuracy: 0.6289
Epoch 16/50
103/103 [==============================] - 0s 3ms/step - loss: 0.6209 - accuracy: 0.7283 - val_loss: 1.0669 - val_accuracy: 0.5938
Epoch 17/50
103/103 [==============================] - 0s 3ms/step - loss: 0.6248 - accuracy: 0.7370 - val_loss: 1.0806 - val_accuracy: 0.5703
Epoch 18/50
103/103 [==============================] - 0s 3ms/step - loss: 0.5565 - accuracy: 0.7625 - val_loss: 1.1467 - val_accuracy: 0.5625
Epoch 19/50
103/103 [==============================] - 0s 3ms/step - loss: 0.5564 - accuracy: 0.7625 - val_loss: 1.0899 - val_accuracy: 0.5977
Epoch 20/50
103/103 [==============================] - 0s 3ms/step - loss: 0.5266 - accuracy: 0.7742 - val_loss: 1.1610 - val_accuracy: 0.6016
Epoch 21/50
103/103 [==============================] - 0s 3ms/step - loss: 0.5038 - accuracy: 0.7918 - val_loss: 1.1768 - val_accuracy: 0.5938
Epoch 22/50
103/103 [==============================] - 0s 3ms/step - loss: 0.4779 - accuracy: 0.8074 - val_loss: 1.2377 - val_accuracy: 0.6016
Epoch 23/50
103/103 [==============================] - 0s 3ms/step - loss: 0.4151 - accuracy: 0.8192 - val_loss: 1.4133 - val_accuracy: 0.5781
Epoch 24/50
103/103 [==============================] - 0s 3ms/step - loss: 0.4273 - accuracy: 0.8240 - val_loss: 1.4031 - val_accuracy: 0.6211
Epoch 25/50
103/103 [==============================] - 0s 3ms/step - loss: 0.3884 - accuracy: 0.8319 - val_loss: 1.4533 - val_accuracy: 0.5898
Epoch 26/50
103/103 [==============================] - 0s 3ms/step - loss: 0.3570 - accuracy: 0.8573 - val_loss: 1.3862 - val_accuracy: 0.6445
Epoch 27/50
103/103 [==============================] - 0s 3ms/step - loss: 0.3153 - accuracy: 0.8690 - val_loss: 1.4960 - val_accuracy: 0.6172
Epoch 28/50
103/103 [==============================] - 0s 3ms/step - loss: 0.3038 - accuracy: 0.8778 - val_loss: 1.5178 - val_accuracy: 0.6094
Epoch 29/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2652 - accuracy: 0.9062 - val_loss: 1.8303 - val_accuracy: 0.5938
Epoch 30/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2647 - accuracy: 0.8935 - val_loss: 1.9841 - val_accuracy: 0.6094
Epoch 31/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2518 - accuracy: 0.9022 - val_loss: 1.6865 - val_accuracy: 0.5859
Epoch 32/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2193 - accuracy: 0.9198 - val_loss: 2.0315 - val_accuracy: 0.5938
Epoch 33/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2294 - accuracy: 0.9120 - val_loss: 1.8131 - val_accuracy: 0.6133
Epoch 34/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1786 - accuracy: 0.9247 - val_loss: 1.9996 - val_accuracy: 0.6016
Epoch 35/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1633 - accuracy: 0.9482 - val_loss: 2.0034 - val_accuracy: 0.6250
Epoch 36/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1438 - accuracy: 0.9501 - val_loss: 2.0639 - val_accuracy: 0.6289
Epoch 37/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2709 - accuracy: 0.9150 - val_loss: 2.0477 - val_accuracy: 0.6094
Epoch 38/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1980 - accuracy: 0.9218 - val_loss: 1.9655 - val_accuracy: 0.6094
Epoch 39/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1841 - accuracy: 0.9335 - val_loss: 2.0923 - val_accuracy: 0.6172
Epoch 40/50
103/103 [==============================] - 0s 3ms/step - loss: 0.2185 - accuracy: 0.9267 - val_loss: 2.2261 - val_accuracy: 0.5977
Epoch 41/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1308 - accuracy: 0.9570 - val_loss: 2.3560 - val_accuracy: 0.6289
Epoch 42/50
103/103 [==============================] - 0s 3ms/step - loss: 0.0938 - accuracy: 0.9785 - val_loss: 2.3621 - val_accuracy: 0.5977
Epoch 43/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1035 - accuracy: 0.9629 - val_loss: 2.7461 - val_accuracy: 0.5859
Epoch 44/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1418 - accuracy: 0.9541 - val_loss: 2.3777 - val_accuracy: 0.6328
Epoch 45/50
103/103 [==============================] - 0s 3ms/step - loss: 0.0961 - accuracy: 0.9658 - val_loss: 2.6022 - val_accuracy: 0.6055
Epoch 46/50
103/103 [==============================] - 0s 3ms/step - loss: 0.0672 - accuracy: 0.9746 - val_loss: 2.5878 - val_accuracy: 0.6133
Epoch 47/50
103/103 [==============================] - 0s 3ms/step - loss: 0.0751 - accuracy: 0.9775 - val_loss: 2.5179 - val_accuracy: 0.6055
Epoch 48/50
103/103 [==============================] - 0s 3ms/step - loss: 0.1291 - accuracy: 0.9521 - val_loss: 2.3628 - val_accuracy: 0.6328
Epoch 49/50
103/103 [==============================] - 0s 3ms/step - loss: 0.0695 - accuracy: 0.9746 - val_loss: 2.6980 - val_accuracy: 0.6172
Epoch 50/50
103/103 [==============================] - 0s 3ms/step - loss: 0.0392 - accuracy: 0.9873 - val_loss: 2.8444 - val_accuracy: 0.6328
10/10 [==============================] - 0s 2ms/step - loss: 3.1516 - accuracy: 0.6281
テストセットの損失: 3.1516
テストセットの正解率: 0.6281
40/40 [==============================] - 0s 2ms/step
10/10 [==============================] - 0s 2ms/step
正解率(train): 0.915
正解率(test): 0.628
まとめ
・ワインの品質をニューラルネットワークで予測するプログラムを書いてみた。
・実行結果はビミョーだった。。