More than 1 year has passed since last update.

Pytorch-GeometricのOfficial Exampleを読む (2)

Last updated at 2024-04-08Posted at 2024-04-08

Notebookなどのまとめ

2についてのメモ的なもの。自分でdocumentを読める方には不要。

2. Node Classification with Graph Neural Networks

node classification(ノード分類)についてのtutorial
非常に小さい割合の正解ラベルが与えられる => 残りのラベルを推論したい！

実行コード

dataset

datasetとしてCoraを用いる。特徴は以下の通り。

各ノードは1433次元のBoW特徴量を持つ
2つの文書の間に引用リンクが存在する場合に接続される
目標は、各文書のカテゴリ(合計7つ)を推測すること
2708個のnodeのうち、学習に使用できるのは140個(5%)のみ

MLPとの比較

理論上はBoW特徴量があるならば、それに基づいて文書カテゴリを推論できるはず
比較のためにシンプルなMLPを構築し、結果を確認する

import torch
from torch.nn import Linear
import torch.nn.functional as F


class MLP(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        torch.manual_seed(12345)
        self.lin1 = Linear(dataset.num_features, hidden_channels)
        self.lin2 = Linear(hidden_channels, dataset.num_classes)

    def forward(self, x):
        x = self.lin1(x)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin2(x)
        return x

model = MLP(hidden_channels=16)
print(model)

>>> MLP(
  (lin1): Linear(in_features=1433, out_features=16, bias=True)
  (lin2): Linear(in_features=16, out_features=7, bias=True)
)

損失関数としてCross-EntropyLoss, optimizerにAdamを用いて学習を行う

criterion = torch.nn.CrossEntropyLoss()  # Define loss criterion.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)  # Define optimizer.

def train():
      model.train()
      optimizer.zero_grad()  # Clear gradients.
      out = model(data.x)  # Perform a single forward pass.
      loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
      loss.backward()  # Derive gradients.
      optimizer.step()  # Update parameters based on gradients.
      return loss

def test():
      model.eval()
      out = model(data.x)
      pred = out.argmax(dim=1)  # Use the class with highest probability.
      test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
      test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
      return test_acc

# train実行
for epoch in range(1, 201):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
>>> Epoch: 001, Loss: 1.9615
Epoch: 002, Loss: 1.9557
Epoch: 003, Loss: 1.9505
Epoch: 004, Loss: 1.9423
...
Epoch: 197, Loss: 0.3985
Epoch: 198, Loss: 0.4664
Epoch: 199, Loss: 0.3714
Epoch: 200, Loss: 0.3810

# test実行
test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')
>>> Test Accuracy: 0.5900

MLPのAccuracyは59%とあまり振るわない結果に。
これは、学習に使用できるノードが少なく過学習を起こしているために、未知のノード表現に対する汎化が上手くいっていないことを表している。

GNNによる学習

MLPにおけるtorch.nn.Linear層を入れ替えすることで簡単に実装が可能。
ここではGNN層として、GCNConv層を用いる。詳しい数式や、畳み込みなどの概念に関してはNotebookや論文¹を参照するべし。

from torch_geometric.nn import GCNConv


class GCN(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        torch.manual_seed(1234567)
        self.conv1 = GCNConv(dataset.num_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, dataset.num_classes)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GCN(hidden_channels=16)
print(model)
>>> GCN(
  (conv1): GCNConv(1433, 16)
  (conv2): GCNConv(16, 7)
)

学習の手順に大きな差異はないが、今回はedge_indexをGCNモデルの入力にも追加する。

from IPython.display import Javascript  # Restrict height of output cell.
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 300})'''))

model = GCN(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
      model.train()
      optimizer.zero_grad()  # Clear gradients.
      out = model(data.x, data.edge_index)  # Perform a single forward pass.
      loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
      loss.backward()  # Derive gradients.
      optimizer.step()  # Update parameters based on gradients.
      return loss

def test():
      model.eval()
      out = model(data.x, data.edge_index)
      pred = out.argmax(dim=1)  # Use the class with highest probability.
      test_correct = pred[data.test_mask] == data.y[data.test_mask]  # Check against ground-truth labels.
      test_acc = int(test_correct.sum()) / int(data.test_mask.sum())  # Derive ratio of correct predictions.
      return test_acc

# train実行
for epoch in range(1, 101):
    loss = train()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
>>> Epoch: 001, Loss: 1.9463
Epoch: 002, Loss: 1.9409
Epoch: 003, Loss: 1.9343
Epoch: 004, Loss: 1.9275
...
Epoch: 097, Loss: 0.5547
Epoch: 098, Loss: 0.5989
Epoch: 099, Loss: 0.6021
Epoch: 100, Loss: 0.5799

# test実行
test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')
>>> Test Accuracy: 0.8150

GNN層に入れ替えるだけで、精度が81.5%にまで上昇した。
=> ネットワーク情報がいいパフォーマンスに寄与しているということがわかる

Excercises

To achieve better model performance and to avoid overfitting, it is usually a good idea to select the best model based on an additional validation set. The Cora dataset provides a validation node set as data.val_mask, but we haven't used it yet. Can you modify the code to select and test the model with the highest validation performance? This should bring test performance to 82% accuracy.

How does GCN behave when increasing the hidden feature dimensionality or the number of layers? Does increasing the number of layers help at all?

You can try to use different GNN layers to see how model performance changes. What happens if you swap out all GCNConv instances with GATConv layers that make use of attention? Try to write a 2-layer GAT model that makes use of 8 attention heads in the first layer and 1 attention head in the second layer, uses a dropout ratio of 0.6 inside and outside each GATConv call, and uses a hidden_channels dimensions of 8 per head.

from torch_geometric.nn import GATConv


class GAT(torch.nn.Module):
    def __init__(self, hidden_channels, heads):
        super().__init__()
        torch.manual_seed(1234567)
        self.conv1 = GATConv(dataset.num_features, hidden_channels, heads=heads)  # TODO
        self.conv2 = GATConv(hidden_channels * heads, dataset.num_classes, heads=1)  # TODO

    def forward(self, x, edge_index):
        x = F.dropout(x, p=0.6, training=self.training)
        x = self.conv1(x, edge_index)
        x = F.elu(x)
        x = F.dropout(x, p=0.6, training=self.training)
        x = self.conv2(x, edge_index)
        return x

model = GAT(hidden_channels=8, heads=8)
print(model)

optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()

def train():
      model.train()
      optimizer.zero_grad()  # Clear gradients.
      out = model(data.x, data.edge_index)  # Perform a single forward pass.
      loss = criterion(out[data.train_mask], data.y[data.train_mask])  # Compute the loss solely based on the training nodes.
      loss.backward()  # Derive gradients.
      optimizer.step()  # Update parameters based on gradients.
      return loss

def test(mask):
      model.eval()
      out = model(data.x, data.edge_index)
      pred = out.argmax(dim=1)  # Use the class with highest probability.
      correct = pred[mask] == data.y[mask]  # Check against ground-truth labels.
      acc = int(correct.sum()) / int(mask.sum())  # Derive ratio of correct predictions.
      return acc


for epoch in range(1, 201):
    loss = train()
    val_acc = test(data.val_mask)
    test_acc = test(data.test_mask)
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Val: {val_acc:.4f}, Test: {test_acc:.4f}')

1層目のGATConv層の出力は hidden_channelsがheadsの分だけ増えるので、2層目の入力にはhidden_channels * heads を入れる。

output

GAT(
  (conv1): GATConv(1433, 8, heads=8)
  (conv2): GATConv(64, 7, heads=1)
)
Epoch: 001, Loss: 1.9449, Val: 0.2240, Test: 0.2340
Epoch: 002, Loss: 1.9351, Val: 0.4080, Test: 0.4180
Epoch: 003, Loss: 1.9258, Val: 0.5160, Test: 0.5330
Epoch: 004, Loss: 1.9158, Val: 0.6320, Test: 0.6370
Epoch: 005, Loss: 1.9073, Val: 0.7320, Test: 0.7250
...
Epoch: 197, Loss: 0.2526, Val: 0.7720, Test: 0.8070
Epoch: 198, Loss: 0.2608, Val: 0.7760, Test: 0.8080
Epoch: 199, Loss: 0.2505, Val: 0.7740, Test: 0.8090
Epoch: 200, Loss: 0.2025, Val: 0.7760, Test: 0.8060

82% まで精度が上がるのが正しいらしいが、まぁ動いたしヨシ！w

https://arxiv.org/abs/1609.02907 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up