PyTorchの基礎（忘備録）

Last updated at 2025-01-05Posted at 2025-01-05

概要

PyTorchは深層学習を実装するときには欠かせない、ディープラーニングフレームワークです。Facebookの人工知能研究グループ（現Meta）のAI Research lab（FAIR）が2016年から開発しており、以下の機能を実装しています：

テンソル操作: NumPyに似た直感的なテンソル操作が可能で、GPUを活用した高速な数値計算もサポートしています。
自動微分（Autograd）: 計算グラフの自動生成と微分が可能で、複雑なモデルの最適化を容易にします。
ニューラルネットワークモジュール（torch.nn）: ニューラルネットワークの構築に必要なレイヤーや活性化関数、損失関数などが含まれています。
最適化アルゴリズム（torch.optim）: 確率的勾配降下法（SGD）やAdamなど、多様な最適化アルゴリズムが実装されています。
データローダー（torch.utils.data）: データセットの管理やミニバッチの作成、データのシャッフルなど、データ処理を効率化するユーティリティが提供されています。
GPUサポート: CUDA対応のGPUを利用することで、計算の高速化が可能です。
モデルの保存と読み込み: 学習済みモデルの保存や再ロードが容易で、モデルの再利用やデプロイがスムーズに行えます。
ONNXサポート: Open Neural Network Exchange形式でモデルをエクスポートでき、他のフレームワークとの互換性を確保できます。

似たものに, Googleが開発するTensorFlow + Kerasがあります。

テンソル操作

Numpyとほとんど同じことができます（独自実装）。tensor.numpy() torch.from_numpy(numpy_array)で相互変換ができます。
深層学習では様々なデバイスで分散計算を行いますが、テンソルを置く場所は.to(device)や.cpu()、.cuda()などで指定して、演算の引数テンソルが置いてある場所で計算します（違うデバイスで演算しようとするとエラーを出してくれる）

自動微分

深層学習は勾配法で訓練するため、強力な自動微分フレームワークが必須です。
「自動微分」とは行った計算グラフをすべてメモリに保存して、連鎖率で微分してくれるスゴいプログラムのことです。全部保存してるとRAMを食うので、勾配が必要なテンソルはrequired_grad=Trueにしておきます。
PyTorchの自動微分は以下のようになっています。

import torch

# Tensorを作成し、requires_grad属性をTrueに設定
x = torch.tensor([2.0], requires_grad=True)

# 何らかの操作を施す（例：2乗）
y = x ** 2

# yを使って別の操作を行う
z = 2 * y

この時点ですべての計算が計算グラフに保存されています。
例えば以下のような関数でグラフを見てみることができます。

def show_autograd_graph(grad_fn, indent=0):
    print((' '*indent) + grad_fn.__class__.__name__)
    if not hasattr(grad_fn, 'next_functions'):
        return
    for i in grad_fn.next_functions:
        show_autograd_graph(i[0], indent+1)

show_autograd_graph(x.grad_fn)
# NoneType

show_autograd_graph(y.grad_fn)
#PowBackward0
# AccumulateGrad

show_autograd_graph(z.grad_fn)
#MulBackward0
# PowBackward0
#  AccumulateGrad
# NoneType

この状態で、∂z/∂xを次のように計算できます.

z.backward()
print(x.grad)
print(y.grad) # 注意: 保存されていない

微分を取る変数で.backward()を呼ぶと、それ以前のすべての変数でそれに関する勾配が計算され、.gradでアクセスできます。（ここのxのような計算グラフの大元になっている変数のことを「葉ノード」といい（.is_leaf）、RAM節約のため勾配はデフォルトでは葉ノードのみ保存されます. 中間ノードに関する勾配も欲しい場合は、.backward()前にその変数を.retain_grad()しておきます）

x = torch.tensor([2.0], requires_grad=True)
y = x ** 2
z = 2 * y
y.retain_grad()
z.backward()
print(y.grad) # とすれば保存されている

確率分布、NNフレームワーク

PyTorchは非常にたくさんの確率分布をサポートしており、抽象クラスで、エントロピー、logp、サンプリングなどの関数が用意されています。
また、それらに対するKLダイバージェンスを計算するモジュールも準備されています。

NNフレームワークについては, torch.nn.Moduleを基底クラスとして主要なレイヤはきれいに実装されています。

input = torch.randn(128, 20)
model = torch.nn.Linear(20, 30)
output = model(input)
print(output.shape) # torch.Size([128, 30]

torch.nn.Moduleは.parameters()や.named_parameters()で下位モジュールの全てのパラメタが取得でき、

list(model.named_parameters())
# ->
[('weight',
  Parameter containing:
  tensor([[-0.1417, -0.2189,  0.0489,  0.1321, -0.0539, -0.2198,  0.0393, -0.0040,
            0.0168,  0.0241, -0.0063, -0.1249, -0.0658, -0.1669,  0.1462, -0.0304,
            0.1256,  0.2016, -0.1822, -0.2147],
          [ 0.1996, -0.1317,  0.0757,  0.1489,  0.1014,  0.0193, -0.1287, -0.0736,
            0.0273,  0.1579, -0.1506, -0.1618,  0.0555,  0.0248,  0.2015, -0.0297,
           -0.0895,  0.1225, -0.1029, -0.0689],
          [ 0.1770, -0.2215, -0.0763,  0.1582, -0.1178, -0.0898, -0.1060,  0.0135,
            0.0704,  0.1623,  0.0459, -0.0369,  0.1454, -0.1104, -0.1334,  0.1643,
            0.0949,  0.0347,  0.1470, -0.1872],
          [ 0.0193,  0.1208, -0.2013, -0.1545, -0.0810, -0.0479, -0.1371, -0.1655,
            0.0614,  0.1984, -0.0357, -0.0388,  0.0354,  0.1424,  0.1927,  0.0150,
           -0.1938,  0.0293,  0.1088,  0.1068],
          [ 0.0323, -0.0921, -0.1476,  0.0256,  0.1881,  0.0446,  0.1665, -0.0318,
           -0.1904, -0.0606, -0.1775,  0.2043, -0.1493, -0.0733, -0.0452,  0.0963,
           -0.1284, -0.0971,  0.1340,  0.2048],
          [-0.1453,  0.0116,  0.1992,  0.1121,  0.1088,  0.1271,  0.1486, -0.0722,
            0.1785,  0.1444, -0.1263, -0.1869, -0.0345, -0.1770,  0.0373, -0.0796,
           -0.0281,  0.1057,  0.1203, -0.1346],
          [ 0.1855, -0.0679,  0.1829, -0.0630,  0.0770, -0.0183, -0.1956,  0.1273,
           -0.0902,  0.1487, -0.2176,  0.0278, -0.1547, -0.0128, -0.0898, -0.1446,
            0.0406,  0.0652, -0.0792,  0.1923],
          [ 0.0382, -0.1074, -0.2124,  0.1678,  0.0422,  0.1228,  0.0961, -0.1988,
           -0.1097, -0.0166,  0.0179, -0.0793, -0.0761, -0.0647,  0.0938,  0.0590,
            0.1401,  0.0172,  0.1807, -0.1942],
          [-0.0209,  0.0240,  0.1917,  0.2076,  0.2231, -0.1523,  0.1808, -0.0898,
           -0.0162, -0.1010,  0.0864, -0.1774, -0.2206,  0.1903, -0.0962,  0.0350,
           -0.2056, -0.2038,  0.0957,  0.0157],
          [-0.1627,  0.1927, -0.1726, -0.2098, -0.0271,  0.0627, -0.0992,  0.1930,
           -0.0621,  0.1418,  0.1300, -0.0096,  0.0104,  0.1379,  0.0695,  0.0286,
           -0.1397, -0.2046,  0.0898, -0.2032],
          [ 0.2225, -0.1266, -0.1108,  0.0117,  0.1471,  0.1522,  0.2197, -0.0477,
           -0.0605, -0.1793, -0.2229, -0.1059, -0.1648, -0.0073, -0.1098,  0.0537,
            0.0424, -0.1608,  0.1615,  0.1264],
          [ 0.1595, -0.0664, -0.1752, -0.0777,  0.1630,  0.1174, -0.0724, -0.0606,
           -0.2071, -0.1993,  0.1862, -0.2000, -0.1658,  0.0730, -0.2036,  0.0284,
            0.1152, -0.0936, -0.0349, -0.0523],
          [-0.1935, -0.0947,  0.1504, -0.1812, -0.2202, -0.0649, -0.0546,  0.1493,
           -0.0245,  0.2148,  0.1155, -0.1243,  0.2178, -0.1287, -0.1023, -0.1964,
           -0.0493, -0.0200, -0.1039,  0.1157],
          [-0.0266, -0.1513,  0.0554, -0.2018,  0.2163, -0.0298, -0.0117,  0.0631,
           -0.1055, -0.0090,  0.2158,  0.0003, -0.1049,  0.1166, -0.1228,  0.0814,
           -0.1036, -0.1552, -0.1869, -0.1261],
          [-0.0310,  0.1762, -0.0843, -0.0661,  0.0147,  0.1580,  0.1883,  0.0220,
           -0.1943,  0.1670,  0.2054, -0.2150, -0.1808, -0.0141, -0.1819, -0.1971,
           -0.1238, -0.0732,  0.1602,  0.1137],
          [-0.1552, -0.1776, -0.2132,  0.1086,  0.1497,  0.0784,  0.1303,  0.0654,
            0.0937, -0.0448,  0.0427, -0.1801,  0.0898,  0.0698, -0.0970, -0.0213,
            0.2190,  0.1965,  0.1400,  0.1825],
          [-0.0749,  0.0845,  0.1774, -0.0929, -0.1100, -0.1287, -0.0514, -0.0479,
           -0.0355,  0.0571, -0.0614,  0.2099, -0.0828, -0.1461,  0.0902, -0.0537,
            0.2099, -0.1330, -0.0582, -0.1275],
          [-0.0923,  0.0337,  0.0859,  0.0443, -0.1372,  0.0741,  0.2179, -0.1156,
            0.1603,  0.2227,  0.0493, -0.1245,  0.1434, -0.0353, -0.0755,  0.0843,
           -0.1828, -0.1882,  0.1324,  0.0798],
          [ 0.0458, -0.1570, -0.1827, -0.0925,  0.0785,  0.0906,  0.0090, -0.1528,
            0.0878,  0.0473, -0.0719, -0.0207, -0.2149, -0.0270,  0.2165,  0.0125,
           -0.0628,  0.0463, -0.1500,  0.0714],
          [-0.0648,  0.1768, -0.1337, -0.0223,  0.1909,  0.1878, -0.1807,  0.0112,
            0.1152, -0.2148, -0.0055,  0.1939,  0.1498,  0.1343,  0.1830,  0.0983,
            0.0383,  0.1103, -0.1610, -0.0694],
          [ 0.0273,  0.1916, -0.0087,  0.1104,  0.0054, -0.0122, -0.1999, -0.1414,
           -0.0515, -0.0697,  0.0514, -0.1084, -0.2107,  0.2180, -0.0512,  0.0760,
           -0.1911,  0.1280,  0.0409, -0.1463],
          [ 0.0924, -0.1746, -0.0340,  0.1037, -0.1558, -0.1311,  0.1929, -0.1353,
           -0.1559,  0.1421, -0.1079,  0.0985,  0.2169, -0.0020, -0.0295,  0.0883,
           -0.0272,  0.0235,  0.0639,  0.0639],
          [-0.1715, -0.1146,  0.1594, -0.0205, -0.0604, -0.1130,  0.0627, -0.0485,
           -0.1838,  0.1420, -0.2047, -0.1510,  0.1869,  0.0698,  0.1321,  0.1490,
           -0.1300, -0.1582,  0.0763, -0.0788],
          [ 0.1425,  0.0046, -0.1173, -0.1869, -0.0994, -0.1022,  0.0148, -0.0282,
            0.0447, -0.1640, -0.1015, -0.1940,  0.1905,  0.1990, -0.0181,  0.2213,
            0.0700, -0.2048, -0.1179,  0.0491],
          [-0.0915,  0.0111, -0.1933,  0.1084, -0.0300, -0.1323,  0.0011, -0.1654,
           -0.0192,  0.1324, -0.0557,  0.1895, -0.1980, -0.0128, -0.0378, -0.0216,
           -0.0513, -0.1884, -0.0434, -0.1417],
          [-0.2092, -0.1516, -0.1897, -0.2026,  0.0471, -0.0884, -0.0540, -0.1361,
           -0.1812,  0.1294, -0.1404,  0.2155,  0.1436,  0.0714,  0.1171, -0.1140,
            0.1815,  0.2129,  0.0208,  0.0083],
          [ 0.1987,  0.1475,  0.1262,  0.0473,  0.0707,  0.0308, -0.1310,  0.1792,
            0.0065, -0.2075, -0.0207,  0.1781, -0.0612,  0.0483, -0.0628, -0.1835,
            0.1966, -0.2190,  0.1165, -0.1167],
          [-0.0226, -0.0311, -0.0677,  0.1746, -0.0882,  0.1120,  0.2035,  0.0986,
            0.1080, -0.2120,  0.1965,  0.1715, -0.0905,  0.1058, -0.2076,  0.2141,
           -0.2231, -0.0458, -0.1116, -0.1760],
          [ 0.0831,  0.1371,  0.1361, -0.2187,  0.0413, -0.1586,  0.1899,  0.1861,
            0.1684,  0.1244,  0.0163,  0.0938,  0.1387, -0.2152,  0.0675, -0.0555,
            0.0143, -0.0313,  0.1695,  0.1120],
          [ 0.1869,  0.0070,  0.0404,  0.1169,  0.1053, -0.0722,  0.1349, -0.1627,
           -0.2080, -0.1297, -0.0230, -0.0730, -0.1077, -0.0452,  0.0573,  0.1118,
            0.0003,  0.0223, -0.2074, -0.1279]], requires_grad=True)),
 ('bias',
  Parameter containing:
  tensor([ 0.0064,  0.0922,  0.1868, -0.1162, -0.1262, -0.0270,  0.0707, -0.1872,
           0.1732, -0.0157,  0.1196,  0.0974,  0.0526,  0.0241, -0.1264,  0.0469,
           0.1535, -0.2032, -0.2062, -0.0379, -0.0265, -0.1760,  0.1326,  0.1707,
          -0.2097,  0.0541, -0.1413,  0.1921,  0.2187,  0.1972],
         requires_grad=True))]

実際にレイヤでどのような計算が行われているかは、register_forward_hookやregister_forward_pre_hookを設定することで見ることができます。
また、torch.Tensorと同じように.to(device)などで、重みをどこに置くか（=どこで計算するか?）を指定できます。

また、エコシステムも充実しており、GNN用の拡張ライブラリPyTorch Geometricや各ドメイン用の処理用ライブラリなどが存在します。

これら便利機能が微分可能に実装されていて、自動微分が組み合わせられます。例えばKLダイバージェンスについて

a = torch.tensor([1.0,2.0,3.0], requires_grad=True)
b = torch.tensor([4.0,5.0,6.0], requires_grad=True)
pd1 = torch.distributions.Categorical(logits=a) # 離散確率分布
pd2 = torch.distributions.Categorical(logits=b)
kl = torch.distributions.kl.kl_divergence(pd1,pd2)
kl.backward()
print(a.grad)
# tensor([-7.4506e-09, -1.4901e-08, -5.9605e-08])

のようにできます。

最適化

これら強力な自動微分の上に、いくつかの勾配法アルゴリズムが実装されています。
基底クラスはtorch.optim.Optimizerで、コンストラクタで勾配法を適用するパラメタをdictで指定した後（以下のように個別に学習率などを指定できます）、計算グラフに勾配を入れた状態で、.step()を呼ぶと勾配法のパラメタ更新が1ステップ実行されます。

optim.SGD([
    {'params': model.base.parameters(), 'lr': 1e-2},
    {'params': model.classifier.parameters()}
], lr=1e-3, momentum=0.9)

また、パラメタの正則化についてはL2正則化であればweight_decayとしてコントラクタ引数で指定できることが多いです. 他の正規化を使いたい場合はmodule.parameters()から計算グラフを作れます。

a = torch.tensor([1.0,2.0,3.0], requires_grad=True)
b = torch.tensor([4.0,5.0,20.0], requires_grad=True)
optimizer = torch.optim.SGD({b: b}, lr=0.1, momentum=0.9)
for _ in range(100):
    optimizer.zero_grad()
    pd1 = torch.distributions.Categorical(logits=a) # 離散確率分布
    pd2 = torch.distributions.Categorical(logits=b)
    kl = torch.distributions.kl.kl_divergence(pd1,pd2)
    kl.backward() # 計算グラフに勾配を流しこんだ状態で
    optimizer.step() # 勾配法の１ステップ. bが更新される.
    print(kl, pd1.probs, pd2.probs, b.grad)

KLを近づけるように最適化した結果

# tensor(4.2790, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.1254e-07, 3.0590e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(4.2610, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.1742e-07, 3.2415e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(4.2268, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.2729e-07, 3.6188e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(4.1780, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.4282e-07, 4.2342e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(4.1161, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.6528e-07, 5.1679e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(4.0423, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.9669e-07, 6.5521e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.9580, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([2.4000e-07, 8.5960e-07, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.8640, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([2.9954e-07, 1.1630e-06, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.7615, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([3.8153e-07, 1.6178e-06, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.6512, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([4.9493e-07, 2.3072e-06, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.5339, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([6.5268e-07, 3.3652e-06, 1.0000e+00], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.4104, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([8.7355e-07, 5.0084e-06, 9.9999e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3348])
# tensor(3.2812, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.1849e-06, 7.5908e-06, 9.9999e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3347])
# tensor(3.1469, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.6265e-06, 1.1694e-05, 9.9999e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3347])
# tensor(3.0080, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([2.2570e-06, 1.8284e-05, 9.9998e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3347])
# tensor(2.8650, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([3.1624e-06, 2.8968e-05, 9.9997e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3347])
# tensor(2.7184, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([4.4700e-06, 4.6444e-05, 9.9995e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3347])
# tensor(2.5684, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([6.3681e-06, 7.5267e-05, 9.9992e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2447,  0.3347])
# tensor(2.4154, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([9.1363e-06, 1.2316e-04, 9.9987e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2446,  0.3346])
# tensor(2.2598, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.3191e-05, 2.0327e-04, 9.9978e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2445,  0.3345])
# tensor(2.1018, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.9154e-05, 3.3810e-04, 9.9964e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2444,  0.3344])
# tensor(1.9417, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([2.7952e-05, 5.6624e-04, 9.9941e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2442,  0.3342])
# tensor(1.7798, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([4.0972e-05, 9.5411e-04, 9.9900e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2438,  0.3338])
# tensor(1.6165, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([6.0288e-05, 1.6162e-03, 9.9832e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0900, -0.2431,  0.3331])
# tensor(1.4522, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([8.8987e-05, 2.7495e-03, 9.9716e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0899, -0.2420,  0.3319])
# tensor(1.2874, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.3165e-04, 4.6933e-03, 9.9518e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0899, -0.2400,  0.3299])
# tensor(1.1230, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([1.9500e-04, 8.0263e-03, 9.9178e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0898, -0.2367,  0.3265])
# tensor(0.9603, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([2.8872e-04, 1.3725e-02, 9.8599e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0897, -0.2310,  0.3207])
# tensor(0.8014, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([4.2622e-04, 2.3394e-02, 9.7618e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0896, -0.2213,  0.3109])
# tensor(0.6496, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([6.2494e-04, 3.9558e-02, 9.5982e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0894, -0.2052,  0.2946])
# tensor(0.5102, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([9.0471e-04, 6.5865e-02, 9.3323e-01], grad_fn=<SoftmaxBackward0>) tensor([-0.0891, -0.1789,  0.2680])
# tensor(0.3906, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0013, 0.1068, 0.8919], grad_fn=<SoftmaxBackward0>) tensor([-0.0887, -0.1379,  0.2267])
# tensor(0.3001, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0018, 0.1663, 0.8319], grad_fn=<SoftmaxBackward0>) tensor([-0.0883, -0.0784,  0.1667])
# tensor(0.2473, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0023, 0.2447, 0.7530], grad_fn=<SoftmaxBackward0>) tensor([-8.7718e-02, -5.2556e-05,  8.7771e-02])
# tensor(0.2360, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0029, 0.3357, 0.6614], grad_fn=<SoftmaxBackward0>) tensor([-0.0871,  0.0909, -0.0038])
# tensor(0.2610, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0034, 0.4274, 0.5692], grad_fn=<SoftmaxBackward0>) tensor([-0.0866,  0.1827, -0.0961])
# tensor(0.3081, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0040, 0.5076, 0.4885], grad_fn=<SoftmaxBackward0>) tensor([-0.0861,  0.2628, -0.1768])
# tensor(0.3591, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0045, 0.5686, 0.4269], grad_fn=<SoftmaxBackward0>) tensor([-0.0856,  0.3239, -0.2383])
# tensor(0.3984, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0050, 0.6085, 0.3865], grad_fn=<SoftmaxBackward0>) tensor([-0.0850,  0.3638, -0.2788])
# tensor(0.4160, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0056, 0.6283, 0.3661], grad_fn=<SoftmaxBackward0>) tensor([-0.0844,  0.3836, -0.2992])
# tensor(0.4084, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0064, 0.6300, 0.3636], grad_fn=<SoftmaxBackward0>) tensor([-0.0836,  0.3853, -0.3016])
# tensor(0.3774, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0073, 0.6154, 0.3773], grad_fn=<SoftmaxBackward0>) tensor([-0.0827,  0.3707, -0.2880])
# tensor(0.3290, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0084, 0.5863, 0.4053], grad_fn=<SoftmaxBackward0>) tensor([-0.0816,  0.3416, -0.2600])
# tensor(0.2715, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0097, 0.5448, 0.4455], grad_fn=<SoftmaxBackward0>) tensor([-0.0803,  0.3000, -0.2197])
# tensor(0.2136, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0111, 0.4939, 0.4950], grad_fn=<SoftmaxBackward0>) tensor([-0.0790,  0.2492, -0.1702])
# tensor(0.1627, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0124, 0.4379, 0.5497], grad_fn=<SoftmaxBackward0>) tensor([-0.0776,  0.1932, -0.1156])
# tensor(0.1236, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0138, 0.3815, 0.6047], grad_fn=<SoftmaxBackward0>) tensor([-0.0762,  0.1368, -0.0605])
# tensor(0.0976, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0151, 0.3289, 0.6560], grad_fn=<SoftmaxBackward0>) tensor([-0.0749,  0.0842, -0.0093])
# tensor(0.0833, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0164, 0.2830, 0.7006], grad_fn=<SoftmaxBackward0>) tensor([-0.0736,  0.0383,  0.0354])
# tensor(0.0779, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0177, 0.2451, 0.7372], grad_fn=<SoftmaxBackward0>) tensor([-0.0724,  0.0004,  0.0720])
# tensor(0.0777, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0190, 0.2152, 0.7658], grad_fn=<SoftmaxBackward0>) tensor([-0.0710, -0.0296,  0.1005])
# tensor(0.0799, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0206, 0.1925, 0.7869], grad_fn=<SoftmaxBackward0>) tensor([-0.0695, -0.0522,  0.1217])
# tensor(0.0820, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0223, 0.1762, 0.8015], grad_fn=<SoftmaxBackward0>) tensor([-0.0677, -0.0685,  0.1362])
# tensor(0.0824, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0244, 0.1651, 0.8105], grad_fn=<SoftmaxBackward0>) tensor([-0.0656, -0.0796,  0.1452])
# tensor(0.0803, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0269, 0.1585, 0.8146], grad_fn=<SoftmaxBackward0>) tensor([-0.0632, -0.0862,  0.1493])
# tensor(0.0755, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0298, 0.1557, 0.8145], grad_fn=<SoftmaxBackward0>) tensor([-0.0602, -0.0890,  0.1492])
# tensor(0.0681, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0332, 0.1562, 0.8105], grad_fn=<SoftmaxBackward0>) tensor([-0.0568, -0.0885,  0.1453])
# tensor(0.0588, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0372, 0.1596, 0.8032], grad_fn=<SoftmaxBackward0>) tensor([-0.0528, -0.0851,  0.1379])
# tensor(0.0481, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0418, 0.1655, 0.7926], grad_fn=<SoftmaxBackward0>) tensor([-0.0482, -0.0792,  0.1274])
# tensor(0.0371, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0470, 0.1737, 0.7792], grad_fn=<SoftmaxBackward0>) tensor([-0.0430, -0.0710,  0.1140])
# tensor(0.0265, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0528, 0.1839, 0.7633], grad_fn=<SoftmaxBackward0>) tensor([-0.0372, -0.0609,  0.0981])
# tensor(0.0171, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0591, 0.1956, 0.7453], grad_fn=<SoftmaxBackward0>) tensor([-0.0309, -0.0492,  0.0801])
# tensor(0.0096, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0658, 0.2084, 0.7258], grad_fn=<SoftmaxBackward0>) tensor([-0.0242, -0.0364,  0.0606])
# tensor(0.0042, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0728, 0.2217, 0.7055], grad_fn=<SoftmaxBackward0>) tensor([-0.0173, -0.0230,  0.0403])
# tensor(0.0011, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0798, 0.2351, 0.6851], grad_fn=<SoftmaxBackward0>) tensor([-0.0102, -0.0096,  0.0198])
# tensor(8.0516e-05, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0868, 0.2479, 0.6654], grad_fn=<SoftmaxBackward0>) tensor([-0.0033,  0.0031,  0.0001])
# tensor(0.0007, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0934, 0.2595, 0.6471], grad_fn=<SoftmaxBackward0>) tensor([ 0.0034,  0.0147, -0.0181])
# tensor(0.0026, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0996, 0.2694, 0.6310], grad_fn=<SoftmaxBackward0>) tensor([ 0.0096,  0.0247, -0.0342])
# tensor(0.0049, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1052, 0.2774, 0.6175], grad_fn=<SoftmaxBackward0>) tensor([ 0.0151,  0.0326, -0.0478])
# tensor(0.0074, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1100, 0.2831, 0.6069], grad_fn=<SoftmaxBackward0>) tensor([ 0.0200,  0.0383, -0.0583])
# tensor(0.0094, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1140, 0.2865, 0.5995], grad_fn=<SoftmaxBackward0>) tensor([ 0.0240,  0.0417, -0.0657])
# tensor(0.0107, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1172, 0.2876, 0.5953], grad_fn=<SoftmaxBackward0>) tensor([ 0.0271,  0.0428, -0.0700])
# tensor(0.0113, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1194, 0.2866, 0.5940], grad_fn=<SoftmaxBackward0>) tensor([ 0.0294,  0.0418, -0.0712])
# tensor(0.0110, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1208, 0.2837, 0.5955], grad_fn=<SoftmaxBackward0>) tensor([ 0.0308,  0.0390, -0.0697])
# tensor(0.0102, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1213, 0.2793, 0.5994], grad_fn=<SoftmaxBackward0>) tensor([ 0.0313,  0.0346, -0.0659])
# tensor(0.0089, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1211, 0.2737, 0.6052], grad_fn=<SoftmaxBackward0>) tensor([ 0.0311,  0.0290, -0.0600])
# tensor(0.0073, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1202, 0.2672, 0.6125], grad_fn=<SoftmaxBackward0>) tensor([ 0.0302,  0.0225, -0.0527])
# tensor(0.0058, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1188, 0.2603, 0.6209], grad_fn=<SoftmaxBackward0>) tensor([ 0.0287,  0.0156, -0.0443])
# tensor(0.0045, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1169, 0.2533, 0.6299], grad_fn=<SoftmaxBackward0>) tensor([ 0.0268,  0.0085, -0.0354])
# tensor(0.0034, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1146, 0.2464, 0.6390], grad_fn=<SoftmaxBackward0>) tensor([ 0.0246,  0.0016, -0.0262])
# tensor(0.0026, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1122, 0.2399, 0.6479], grad_fn=<SoftmaxBackward0>) tensor([ 0.0222, -0.0049, -0.0173])
# tensor(0.0022, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1097, 0.2340, 0.6564], grad_fn=<SoftmaxBackward0>) tensor([ 0.0196, -0.0108, -0.0089])
# tensor(0.0020, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1071, 0.2288, 0.6641], grad_fn=<SoftmaxBackward0>) tensor([ 0.0171, -0.0159, -0.0012])
# tensor(0.0020, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1047, 0.2245, 0.6708], grad_fn=<SoftmaxBackward0>) tensor([ 0.0146, -0.0202,  0.0056])
# tensor(0.0021, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1023, 0.2212, 0.6765], grad_fn=<SoftmaxBackward0>) tensor([ 0.0123, -0.0235,  0.0112])
# tensor(0.0022, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.1002, 0.2188, 0.6810], grad_fn=<SoftmaxBackward0>) tensor([ 0.0101, -0.0259,  0.0158])
# tensor(0.0023, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0982, 0.2173, 0.6845], grad_fn=<SoftmaxBackward0>) tensor([ 0.0082, -0.0274,  0.0192])
# tensor(0.0023, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0965, 0.2168, 0.6867], grad_fn=<SoftmaxBackward0>) tensor([ 0.0065, -0.0280,  0.0215])
# tensor(0.0022, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0950, 0.2171, 0.6879], grad_fn=<SoftmaxBackward0>) tensor([ 0.0050, -0.0277,  0.0227])
# tensor(0.0020, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0937, 0.2181, 0.6881], grad_fn=<SoftmaxBackward0>) tensor([ 0.0037, -0.0266,  0.0229])
# tensor(0.0017, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0926, 0.2199, 0.6874], grad_fn=<SoftmaxBackward0>) tensor([ 0.0026, -0.0248,  0.0222])
# tensor(0.0014, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0917, 0.2223, 0.6859], grad_fn=<SoftmaxBackward0>) tensor([ 0.0017, -0.0224,  0.0207])
# tensor(0.0011, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0910, 0.2252, 0.6838], grad_fn=<SoftmaxBackward0>) tensor([ 0.0010, -0.0195,  0.0186])
# tensor(0.0008, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0904, 0.2284, 0.6812], grad_fn=<SoftmaxBackward0>) tensor([ 0.0004, -0.0163,  0.0159])
# tensor(0.0005, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0899, 0.2319, 0.6781], grad_fn=<SoftmaxBackward0>) tensor([-0.0001, -0.0128,  0.0129])
# tensor(0.0002, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0895, 0.2355, 0.6749], grad_fn=<SoftmaxBackward0>) tensor([-0.0005, -0.0092,  0.0097])
# tensor(9.8544e-05, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0892, 0.2391, 0.6716], grad_fn=<SoftmaxBackward0>) tensor([-0.0008, -0.0056,  0.0064])
# tensor(2.2543e-05, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0890, 0.2426, 0.6684], grad_fn=<SoftmaxBackward0>) tensor([-0.0010, -0.0021,  0.0031])
# tensor(1.1019e-05, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0888, 0.2459, 0.6653], grad_fn=<SoftmaxBackward0>) tensor([-0.0012,  0.0011,  0.0001])
# tensor(4.9602e-05, grad_fn=<SumBackward1>) tensor([0.0900, 0.2447, 0.6652], grad_fn=<SoftmaxBackward0>) tensor([0.0886, 0.2488, 0.6626], grad_fn=<SoftmaxBackward0>) tensor([-0.0014,  0.0041, -0.0026])

勾配法のループで毎回「計算グラフを作って、微分」を繰り返す形になります.

実装で気を付けること

再現性のためには、

minicondaやuvでバージョンを厳密にそろえること
乱数調整
テンソルの置かれているデバイスの管理

が大切です.

乱数状態は標準ライブラリのrandom, numpy, torch(cpu,cuda)で別々なので

random.getstate()
torch.get_rng_state()
torch.cuda.get_rng_state(device='cuda')
np.random.get_state()

思わぬところで乱数使ってた！ってことがあったりするかもです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up