第1部:教師あり学習(Supervised Learning)
🔹【分類(Classification)】
-
ロジスティック回帰
目的: 「伝説ポケモンかどうか」を種族値から予測
🔧LogisticRegression
(scikit-learn) -
k近傍法(k-NN)
目的: ポケモンをタイプ別に分類
🔧KNeighborsClassifier
-
決定木・ランダムフォレスト・XGBoost
目的: ポケモンのタイプ or 進化段階を予測
🔧DecisionTreeClassifier
,RandomForestClassifier
,XGBClassifier
🔹【回帰(Regression)】
-
線形回帰・リッジ回帰・ラッソ回帰
目的: 合計種族値
🔧LinearRegression
,Ridge
,Lasso
-
サポートベクターマシン(SVM)
目的: タイプによる高性能ポケモン予測
🔧SVR
,SVC
🔹【評価指標】
-
分類
正解率(Accuracy), 混同行列(Confusion Matrix), Precision, Recall, F1 Score
🔧classification_report
,confusion_matrix
-
回帰
MSE(平均二乗誤差), RMSE(平方平均誤差), R²スコア
🔧mean_squared_error
,r2_score
第2部:教師なし学習(Unsupervised Learning)
🔹【クラスタリング】
-
k-means クラスタリング
目的: ポケモンの能力傾向に応じたグループ化
🔧KMeans
-
階層クラスタリング / デンドログラム可視化
🔧scipy.cluster.hierarchy
-
DBSCAN(密度ベースクラスタリング)
目的: 特異値やレアタイプの発見
🔧DBSCAN
🔹【次元削減】
-
PCA(主成分分析)
目的: 種族値の主成分抽出・可視化
🔧PCA
(sklearn) -
t-SNE / UMAP
目的: 高次元特徴量を2Dマップへ
🔧TSNE
,UMAP
第3部:深層学習(Deep Learning)
🔹【ニューラルネットワーク基礎】
-
パーセプトロン・MLP
目的: タイプ・進化段階分類、種族値回帰
🔧Sequential
,Dense
(Keras)
🔹【活性化関数】
- ReLU, Sigmoid, Softmax の使い分けと効果比較
🔹【フレームワーク】
-
TensorFlow / Keras または PyTorch を選択
画像分類モデルや数値予測モデルの構築
🔹【CNN(畳み込みニューラルネットワーク)】
-
画像からポケモンのタイプを分類
例: Pokémon image dataset でType1
を予測
🔧 Conv2D, MaxPooling, Flatten, Dropout
# プログラム名: pokemon_supervised_learning_analysis.py
# Program Name: pokemon_supervised_learning_analysis.py
# Creation Date: 20250507
# Overview: Perform classification and regression using Pokémon base stats from Gen 1
# Usage: To run the program, use the command `python pokemon_supervised_learning_analysis.py` in the terminal
# 必要なライブラリのインストール / Install required libraries
!pip install -q scikit-learn xgboost matplotlib pandas numpy
# --- インポート / Import libraries ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, LinearRegression, Ridge, Lasso
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn.metrics import (confusion_matrix, classification_report, accuracy_score,
mean_squared_error, r2_score)
# --- データ定義 / Define Pokémon base stats ---
data = {
"Name": ["フシギダネ", "フシギソウ", "フシギバナ", "リザードン", "ピカチュウ",
"ライチュウ", "カビゴン", "カイリュー", "ミュウツー", "ミュウ"],
"HP": [45, 60, 80, 78, 35, 60, 160, 91, 106, 100],
"Attack": [49, 62, 82, 84, 55, 90, 110, 134, 110, 100],
"Defense": [49, 63, 83, 78, 40, 55, 65, 95, 90, 100],
"Sp. Atk": [65, 80, 100, 109, 50, 90, 65, 100, 154, 100],
"Sp. Def": [65, 80, 100, 85, 50, 80, 110, 100, 90, 100],
"Speed": [45, 60, 80, 100, 90, 110, 30, 80, 130, 100],
"Legendary": [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data)
# --- 特徴量と目的変数の準備 / Prepare features and targets ---
X = df.drop(columns=["Name", "Legendary"])
y_class = df["Legendary"]
y_reg = df["HP"] # 回帰用にHPを目的変数とする / Use HP for regression
# --- データ分割 / Split data ---
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(X, y_class, test_size=0.3, random_state=42)
X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X, y_reg, test_size=0.3, random_state=42)
# --- 標準化 / Standardization ---
scaler = StandardScaler()
X_train_c = scaler.fit_transform(X_train_c)
X_test_c = scaler.transform(X_test_c)
X_train_r = scaler.fit_transform(X_train_r)
X_test_r = scaler.transform(X_test_r)
# --- 分類モデル一覧 / Classification models ---
classifiers = {
"Logistic Regression": LogisticRegression(),
"KNN": KNeighborsClassifier(n_neighbors=3),
"Decision Tree": DecisionTreeClassifier(),
"Random Forest": RandomForestClassifier(),
"XGBoost": XGBClassifier(use_label_encoder=False, eval_metric='logloss'),
"SVM": SVC()
}
# --- 分類評価 / Evaluate classification models ---
print("\n Classification Results (Legendary Prediction):")
for name, model in classifiers.items():
model.fit(X_train_c, y_train_c)
y_pred = model.predict(X_test_c)
print(f"\nModel: {name}")
print(f"Accuracy: {accuracy_score(y_test_c, y_pred):.2f}")
print("Confusion Matrix:\n", confusion_matrix(y_test_c, y_pred))
print("Classification Report:\n", classification_report(y_test_c, y_pred))
# --- 回帰モデル一覧 / Regression models ---
regressors = {
"Linear Regression": LinearRegression(),
"Ridge Regression": Ridge(),
"Lasso Regression": Lasso()
}
# --- 回帰評価 / Evaluate regression models ---
print("\n Regression Results (HP Prediction):")
for name, model in regressors.items():
model.fit(X_train_r, y_train_r)
y_pred = model.predict(X_test_r)
mse = mean_squared_error(y_test_r, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test_r, y_pred)
print(f"\nModel: {name}")
print(f"MSE: {mse:.2f}, RMSE: {rmse:.2f}, R²: {r2:.2f}")
# --- プロット:実際と予測の比較 / Plot: Actual vs Predicted HP ---
plt.figure(figsize=(8, 6))
for name, model in regressors.items():
model.fit(X_train_r, y_train_r)
y_pred = model.predict(X_test_r)
plt.plot(y_test_r.values, label="Actual", marker='o')
plt.plot(y_pred, label=f"Predicted ({name})", linestyle='--')
plt.title("Actual vs Predicted HP")
plt.xlabel("Sample Index")
plt.ylabel("HP Value")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# Program Name: pokemon_unsupervised_learning.py
# Creation Date: 20250507
# Overview: Unsupervised learning (clustering and dimensionality reduction) on dummy Pokémon base stats
# Usage: To run the program, use the command `python pokemon_unsupervised_learning.py` in the terminal
# --- ライブラリのインストール / Install required libraries ---
# !pip install pandas numpy matplotlib seaborn scikit-learn umap-learn --quiet
# --- ライブラリのインポート / Import libraries ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import umap
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
# --- 数値パラメータ設定 / Parameter settings ---
n_clusters = 3
tsne_perplexity = 3
umap_neighbors = 3
umap_min_dist = 0.1
# --- ダミーデータ作成 / Dummy Pokémon stat data ---
data = {
"Name": ["Bulbasaur", "Ivysaur", "Venusaur", "Charmander", "Charizard", "Squirtle", "Pikachu", "Snorlax", "Mewtwo", "Dragonite"],
"HP": [45, 60, 80, 39, 78, 44, 35, 160, 106, 91],
"Attack": [49, 62, 82, 52, 84, 48, 55, 110, 110, 134],
"Defense": [49, 63, 83, 43, 78, 65, 40, 65, 90, 95],
"Sp. Atk": [65, 80, 100, 60, 109, 50, 50, 65, 154, 100],
"Sp. Def": [65, 80, 100, 50, 85, 64, 50, 110, 90, 100],
"Speed": [45, 60, 80, 65, 100, 43, 90, 30, 130, 80]
}
df = pd.DataFrame(data)
# --- 特徴量の抽出と標準化 / Feature extraction and scaling ---
features = ["HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed"]
X = df[features]
X_scaled = StandardScaler().fit_transform(X)
# --- PCA次元削減 / PCA ---
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# --- t-SNE次元削減 / t-SNE ---
tsne = TSNE(n_components=2, random_state=0, perplexity=tsne_perplexity)
X_tsne = tsne.fit_transform(X_scaled)
# --- UMAP次元削減 / UMAP ---
X_umap = umap.UMAP(n_neighbors=umap_neighbors, min_dist=umap_min_dist, random_state=0).fit_transform(X_scaled)
# --- KMeansクラスタリング / KMeans clustering ---
kmeans = KMeans(n_clusters=n_clusters, random_state=0)
df["KMeans"] = kmeans.fit_predict(X_scaled)
# --- 階層クラスタリング / Agglomerative clustering ---
agg = AgglomerativeClustering(n_clusters=n_clusters)
df["Hierarchical"] = agg.fit_predict(X_scaled)
# --- DBSCANクラスタリング / DBSCAN ---
dbscan = DBSCAN(eps=1.2, min_samples=2)
df["DBSCAN"] = dbscan.fit_predict(X_scaled)
# --- 次元削減結果をPCA可視化 / Plot PCA ---
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=df["KMeans"], cmap='viridis', s=100)
for i, name in enumerate(df["Name"]):
plt.text(X_pca[i, 0]+0.1, X_pca[i, 1]+0.1, name)
plt.title("PCA Projection with KMeans Clusters")
plt.xlabel("PCA 1")
plt.ylabel("PCA 2")
plt.grid(True)
plt.tight_layout()
plt.show()
# --- t-SNE結果プロット / t-SNE plot ---
plt.figure(figsize=(8, 6))
plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=df["Hierarchical"], cmap='tab10', s=100)
plt.title("t-SNE Projection with Hierarchical Clusters")
plt.xlabel("t-SNE 1")
plt.ylabel("t-SNE 2")
plt.grid(True)
plt.tight_layout()
plt.show()
# --- UMAP結果プロット / UMAP plot ---
plt.figure(figsize=(8, 6))
plt.scatter(X_umap[:, 0], X_umap[:, 1], c=df["DBSCAN"], cmap='Set1', s=100)
plt.title("UMAP Projection with DBSCAN Clusters")
plt.xlabel("UMAP 1")
plt.ylabel("UMAP 2")
plt.grid(True)
plt.tight_layout()
plt.show()
# --- 結果表示 / Show final DataFrame ---
print("\n Clustering Results:")
print(df[["Name", "KMeans", "Hierarchical", "DBSCAN"]].to_string(index=False))
# Program Name: pokemon_dl_classification.py
# Creation Date: 20250507
# Overview: Deep learning classification of Pokémon base stats using MLP and CNN
# Usage: Run the script to train MLP and CNN models to classify Pokémon by legendary status using dummy base stats
# --- Install Required Packages ---
!pip install tensorflow pandas scikit-learn matplotlib --quiet
# --- Import Libraries ---
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv1D, Flatten, MaxPooling1D
from tensorflow.keras.utils import to_categorical
# --- Dummy Data Preparation / ダミーデータ作成 ---
data = {
'HP': [45, 60, 80, 39, 58, 78, 44, 59, 79, 130],
'Attack': [49, 62, 82, 52, 64, 84, 48, 63, 83, 85],
'Defense': [49, 63, 83, 43, 58, 78, 65, 80, 100, 80],
'Sp_Atk': [65, 80, 100, 60, 80, 109, 50, 65, 85, 95],
'Sp_Def': [65, 80, 100, 50, 65, 85, 64, 80, 105, 95],
'Speed': [45, 60, 80, 65, 80, 100, 43, 58, 78, 60],
'Legendary': [0, 0, 0, 0, 0, 1, 0, 0, 1, 1]
}
df = pd.DataFrame(data)
# --- Feature and Label Preparation / 特徴量とラベルの準備 ---
X = df.drop('Legendary', axis=1).values
y = df['Legendary'].values
# --- Train-Test Split / 学習・テスト分割 ---
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# --- Standardization / 標準化 ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# --- Convert labels to categorical / ラベルをカテゴリカルに変換 ---
y_train_cat = to_categorical(y_train)
y_test_cat = to_categorical(y_test)
# --- MLP Model / 多層パーセプトロンモデル ---
mlp_model = Sequential([
Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
Dropout(0.3),
Dense(32, activation='relu'),
Dense(2, activation='softmax')
])
mlp_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
mlp_model.fit(X_train_scaled, y_train_cat, epochs=50, batch_size=4, verbose=0)
mlp_eval = mlp_model.evaluate(X_test_scaled, y_test_cat)
# --- CNN Model / 畳み込みニューラルネットワーク ---
X_train_cnn = X_train_scaled[..., np.newaxis]
X_test_cnn = X_test_scaled[..., np.newaxis]
cnn_model = Sequential([
Conv1D(32, 3, activation='relu', input_shape=(X_train_cnn.shape[1], 1)),
MaxPooling1D(2),
Flatten(),
Dense(32, activation='relu'),
Dense(2, activation='softmax')
])
cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
cnn_model.fit(X_train_cnn, y_train_cat, epochs=50, batch_size=4, verbose=0)
cnn_eval = cnn_model.evaluate(X_test_cnn, y_test_cat)
# --- Evaluation Results / 評価結果表示 ---
print(f"\nMLP Accuracy: {mlp_eval[1]:.4f}")
print(f"CNN Accuracy: {cnn_eval[1]:.4f}")
# --- Plotting (optional) / 精度プロット(任意) ---
plt.bar(['MLP', 'CNN'], [mlp_eval[1], cnn_eval[1]])
plt.title("Model Accuracy Comparison")
plt.ylabel("Accuracy")
plt.show()