Notebookなどのまとめ
2についてのメモ的なもの。自分でdocumentを読める方には不要。
2. Node Classification with Graph Neural Networks
- node classification(ノード分類)についてのtutorial
- 非常に小さい割合の正解ラベルが与えられる => 残りのラベルを推論したい!
実行コード
dataset
datasetとしてCora
を用いる。特徴は以下の通り。
- 各ノードは1433次元のBoW特徴量を持つ
- 2つの文書の間に引用リンクが存在する場合に接続される
- 目標は、各文書のカテゴリ(合計7つ)を推測すること
- 2708個のnodeのうち、学習に使用できるのは140個(5%)のみ
MLPとの比較
理論上はBoW特徴量があるならば、それに基づいて文書カテゴリを推論できるはず
比較のためにシンプルなMLPを構築し、結果を確認する
import torch
from torch.nn import Linear
import torch.nn.functional as F
class MLP(torch.nn.Module):
def __init__(self, hidden_channels):
super().__init__()
torch.manual_seed(12345)
self.lin1 = Linear(dataset.num_features, hidden_channels)
self.lin2 = Linear(hidden_channels, dataset.num_classes)
def forward(self, x):
x = self.lin1(x)
x = x.relu()
x = F.dropout(x, p=0.5, training=self.training)
x = self.lin2(x)
return x
model = MLP(hidden_channels=16)
print(model)
>>> MLP(
(lin1): Linear(in_features=1433, out_features=16, bias=True)
(lin2): Linear(in_features=16, out_features=7, bias=True)
)
損失関数としてCross-EntropyLoss, optimizerにAdamを用いて学習を行う
criterion = torch.nn.CrossEntropyLoss() # Define loss criterion.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4) # Define optimizer.
def train():
model.train()
optimizer.zero_grad() # Clear gradients.
out = model(data.x) # Perform a single forward pass.
loss = criterion(out[data.train_mask], data.y[data.train_mask]) # Compute the loss solely based on the training nodes.
loss.backward() # Derive gradients.
optimizer.step() # Update parameters based on gradients.
return loss
def test():
model.eval()
out = model(data.x)
pred = out.argmax(dim=1) # Use the class with highest probability.
test_correct = pred[data.test_mask] == data.y[data.test_mask] # Check against ground-truth labels.
test_acc = int(test_correct.sum()) / int(data.test_mask.sum()) # Derive ratio of correct predictions.
return test_acc
# train実行
for epoch in range(1, 201):
loss = train()
print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
>>> Epoch: 001, Loss: 1.9615
Epoch: 002, Loss: 1.9557
Epoch: 003, Loss: 1.9505
Epoch: 004, Loss: 1.9423
...
Epoch: 197, Loss: 0.3985
Epoch: 198, Loss: 0.4664
Epoch: 199, Loss: 0.3714
Epoch: 200, Loss: 0.3810
# test実行
test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')
>>> Test Accuracy: 0.5900
MLPのAccuracyは59%とあまり振るわない結果に。
これは、学習に使用できるノードが少なく過学習を起こしているために、未知のノード表現に対する汎化が上手くいっていないことを表している。
GNNによる学習
MLPにおけるtorch.nn.Linear
層を入れ替えすることで簡単に実装が可能。
ここではGNN層として、GCNConv
層を用いる。詳しい数式や、畳み込みなどの概念に関してはNotebookや論文1を参照するべし。
from torch_geometric.nn import GCNConv
class GCN(torch.nn.Module):
def __init__(self, hidden_channels):
super().__init__()
torch.manual_seed(1234567)
self.conv1 = GCNConv(dataset.num_features, hidden_channels)
self.conv2 = GCNConv(hidden_channels, dataset.num_classes)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index)
x = x.relu()
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index)
return x
model = GCN(hidden_channels=16)
print(model)
>>> GCN(
(conv1): GCNConv(1433, 16)
(conv2): GCNConv(16, 7)
)
学習の手順に大きな差異はないが、今回はedge_index
をGCNモデルの入力にも追加する。
from IPython.display import Javascript # Restrict height of output cell.
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 300})'''))
model = GCN(hidden_channels=16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
def train():
model.train()
optimizer.zero_grad() # Clear gradients.
out = model(data.x, data.edge_index) # Perform a single forward pass.
loss = criterion(out[data.train_mask], data.y[data.train_mask]) # Compute the loss solely based on the training nodes.
loss.backward() # Derive gradients.
optimizer.step() # Update parameters based on gradients.
return loss
def test():
model.eval()
out = model(data.x, data.edge_index)
pred = out.argmax(dim=1) # Use the class with highest probability.
test_correct = pred[data.test_mask] == data.y[data.test_mask] # Check against ground-truth labels.
test_acc = int(test_correct.sum()) / int(data.test_mask.sum()) # Derive ratio of correct predictions.
return test_acc
# train実行
for epoch in range(1, 101):
loss = train()
print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')
>>> Epoch: 001, Loss: 1.9463
Epoch: 002, Loss: 1.9409
Epoch: 003, Loss: 1.9343
Epoch: 004, Loss: 1.9275
...
Epoch: 097, Loss: 0.5547
Epoch: 098, Loss: 0.5989
Epoch: 099, Loss: 0.6021
Epoch: 100, Loss: 0.5799
# test実行
test_acc = test()
print(f'Test Accuracy: {test_acc:.4f}')
>>> Test Accuracy: 0.8150
GNN層に入れ替えるだけで、精度が81.5%にまで上昇した。
=> ネットワーク情報がいいパフォーマンスに寄与しているということがわかる
Excercises
- To achieve better model performance and to avoid overfitting, it is usually a good idea to select the best model based on an additional validation set. The Cora dataset provides a validation node set as data.val_mask, but we haven't used it yet. Can you modify the code to select and test the model with the highest validation performance? This should bring test performance to 82% accuracy.
- How does GCN behave when increasing the hidden feature dimensionality or the number of layers? Does increasing the number of layers help at all?
- You can try to use different GNN layers to see how model performance changes. What happens if you swap out all GCNConv instances with GATConv layers that make use of attention? Try to write a 2-layer GAT model that makes use of 8 attention heads in the first layer and 1 attention head in the second layer, uses a dropout ratio of 0.6 inside and outside each GATConv call, and uses a hidden_channels dimensions of 8 per head.
from torch_geometric.nn import GATConv
class GAT(torch.nn.Module):
def __init__(self, hidden_channels, heads):
super().__init__()
torch.manual_seed(1234567)
self.conv1 = GATConv(dataset.num_features, hidden_channels, heads=heads) # TODO
self.conv2 = GATConv(hidden_channels * heads, dataset.num_classes, heads=1) # TODO
def forward(self, x, edge_index):
x = F.dropout(x, p=0.6, training=self.training)
x = self.conv1(x, edge_index)
x = F.elu(x)
x = F.dropout(x, p=0.6, training=self.training)
x = self.conv2(x, edge_index)
return x
model = GAT(hidden_channels=8, heads=8)
print(model)
optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
def train():
model.train()
optimizer.zero_grad() # Clear gradients.
out = model(data.x, data.edge_index) # Perform a single forward pass.
loss = criterion(out[data.train_mask], data.y[data.train_mask]) # Compute the loss solely based on the training nodes.
loss.backward() # Derive gradients.
optimizer.step() # Update parameters based on gradients.
return loss
def test(mask):
model.eval()
out = model(data.x, data.edge_index)
pred = out.argmax(dim=1) # Use the class with highest probability.
correct = pred[mask] == data.y[mask] # Check against ground-truth labels.
acc = int(correct.sum()) / int(mask.sum()) # Derive ratio of correct predictions.
return acc
for epoch in range(1, 201):
loss = train()
val_acc = test(data.val_mask)
test_acc = test(data.test_mask)
print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Val: {val_acc:.4f}, Test: {test_acc:.4f}')
1層目のGATConv
層の出力は hidden_channels
がheads
の分だけ増えるので、2層目の入力にはhidden_channels * heads
を入れる。
GAT(
(conv1): GATConv(1433, 8, heads=8)
(conv2): GATConv(64, 7, heads=1)
)
Epoch: 001, Loss: 1.9449, Val: 0.2240, Test: 0.2340
Epoch: 002, Loss: 1.9351, Val: 0.4080, Test: 0.4180
Epoch: 003, Loss: 1.9258, Val: 0.5160, Test: 0.5330
Epoch: 004, Loss: 1.9158, Val: 0.6320, Test: 0.6370
Epoch: 005, Loss: 1.9073, Val: 0.7320, Test: 0.7250
...
Epoch: 197, Loss: 0.2526, Val: 0.7720, Test: 0.8070
Epoch: 198, Loss: 0.2608, Val: 0.7760, Test: 0.8080
Epoch: 199, Loss: 0.2505, Val: 0.7740, Test: 0.8090
Epoch: 200, Loss: 0.2025, Val: 0.7760, Test: 0.8060
82% まで精度が上がるのが正しいらしいが、まぁ動いたしヨシ!w