More than 1 year has passed since last update.

Pytorchの基礎

PyTorch

Last updated at 2024-09-23Posted at 2024-09-23

PyTorchの基礎：Irisデータセットを使ったニューラルネットワークの構築と学習

この記事では、PyTorchを使ってIrisデータセットを対象に簡単なニューラルネットワークを構築し、学習を行う方法を説明します。Irisデータセットは、機械学習の入門的なデータセットとしてよく利用されるため、学習の流れを掴むのに最適です。

環境準備

まず、PyTorchと必要なライブラリをインポートします。

import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.datasets import load_iris

Irisデータセットの読み込み

scikit-learnからIrisデータセットをロードします。

iris = load_iris()

# 入力データとラベルを取得
x = iris['data']
t = iris['target']

取得したデータはNumPy配列なので、PyTorchのテンソルに変換します。

x = torch.tensor(x, dtype=torch.float32)
t = torch.tensor(t, dtype=torch.int64)

print(x.shape, t.shape)  # 出力: (150, 4) (150,)

データセットの分割

学習データ、検証データ、テストデータを分割します。

# データセットの分割割合
n_train = int(len(x) * 0.6)
n_val = int(len(x) * 0.2)
n_test = len(x) - n_train - n_val

train, val, test = torch.utils.data.random_split(torch.utils.data.TensorDataset(x, t), [n_train, n_val, n_test])

# DataLoaderの定義
batch_size = 10
train_loader = torch.utils.data.DataLoader(train, batch_size, shuffle=True, drop_last=True)
val_loader = torch.utils.data.DataLoader(val, batch_size)
test_loader = torch.utils.data.DataLoader(test, batch_size)

ネットワークの定義

次に、PyTorchのnn.Moduleを使ってネットワークを定義します。このネットワークは4つの入力ノード、4つの中間層ノード、3つの出力ノードを持つシンプルな全結合層（fully-connected layer）です。

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(4, 4)
        self.fc2 = nn.Linear(4, 3)

    def forward(self, x):
        h = F.relu(self.fc1(x))
        h = self.fc2(h)
        return h

モデルの初期化とデバイスへの転送

GPUが利用可能であれば、モデルとデータをGPUに転送します。

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
net = Net().to(device)

# 最適化手法
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

学習ループ

学習を1エポック行います。

max_epoch = 1

for epoch in range(max_epoch):
    for batch in train_loader:
        x_batch, t_batch = batch
        x_batch, t_batch = x_batch.to(device), t_batch.to(device)

        # 順伝播
        y = net(x_batch)
        loss = F.cross_entropy(y, t_batch)

        # 勾配のリセット、逆伝播、重みの更新
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # 正解率の計算
        y_label = torch.argmax(y, dim=1)
        accuracy = (y_label == t_batch).sum().float() / len(t_batch)
        print(f'Epoch: {epoch+1}, Loss: {loss.item()}, Accuracy: {accuracy:.2f}')

モデルの評価

学習が終了したら、検証データとテストデータでモデルの性能を評価します。

def calc_accuracy(data_loader):
    correct = 0
    total = 0
    with torch.no_grad():
        for batch in data_loader:
            x_batch, t_batch = batch
            x_batch, t_batch = x_batch.to(device), t_batch.to(device)
            y = net(x_batch)
            y_label = torch.argmax(y, dim=1)
            correct += (y_label == t_batch).sum().item()
            total += len(t_batch)
    return correct / total

# 検証データでの正解率
val_accuracy = calc_accuracy(val_loader)
print(f'Validation Accuracy: {val_accuracy:.2f}')

# テストデータでの正解率
test_accuracy = calc_accuracy(test_loader)
print(f'Test Accuracy: {test_accuracy:.2f}')

結果

1エポックの学習では、検証データとテストデータの正解率が表示されます。エポック数やバッチサイズを変更することで、さらに精度の高いモデルを目指すことが可能です。

イメージ図

参考資料

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up