More than 3 years have passed since last update.

torch.bmmのスピード改善について

PyTorch

Posted at 2020-08-17

何をしたいか

[PyTorch] torch.bmmよりも速く、batchごとに内積を計算する方法があった話（結論では「別記事」として表記）の現状確認
bmm 5x slower on Maxwellが依然Openで、しかもtriagedタグまで付いているので自分で確認する。

結論

torch.bmmが結構早くなった
特に、GPUで内積を計算するだけなら要素積+和で計算するより早い

CPU

計算方法	t(内積)	別記事
要素積+和	43.4 µs	26.4 µs
torch.bmm	199 µs	964 µs

GPU

計算方法	t(内積)	t(内積+微分)	t(微分)
要素積+和	18.7 µs	262 µs	243.3 µs
torch.bmm	17.9 µs	336 µs	318.1 µs

GPU(別記事)

計算方法	t(内積)	t(内積+微分)	t(微分)
要素積+和	25.9 µs	163 µs	137.1 µs
torch.bmm	608 µs	1.13 ms	552 µs

環境

python: 3.7.2
pytorch: 1.6.0

実行したスクリプト

以下はjupyter notebookにて実行

準備

import torch
a = torch.randn(500, 500, dtype=torch.float, device='cpu')
b = torch.randn(500, 500, dtype=torch.float, device='cpu')

c = torch.randn(500, 500, dtype=torch.float, device='cuda')
d = torch.randn(500, 500, dtype=torch.float, device='cuda')

e = torch.randn(500, 500, dtype=torch.float, device='cuda', requires_grad=True)
f = torch.randn(500, 500, dtype=torch.float, device='cuda', requires_grad=True)

同じ演算

# 下の２つの演算は同じです。
(a*b).sum(1, keepdim=True)
torch.bmm(a.unsqueeze(1), b.unsqueeze(2)).squeeze(2))

cpuで演算（要素積+和）

%timeit (a*b).sum(1, keepdim=True)
43.4 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

cpuで演算（bmm）

%timeit torch.bmm(a.unsqueeze(1), b.unsqueeze(2)).squeeze(2)
199 µs ± 2.61 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

gpuで演算（要素積+和）

%timeit (c*d).sum(1, keepdim=True)
18.7 µs ± 39.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

gpuで演算（bmm）

%timeit torch.bmm(c.unsqueeze(1), d.unsqueeze(2)).squeeze(2)
17.9 µs ± 33.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

gpuで演算+微分（要素積+和）

%timeit (e*f).sum(1, keepdim=True).sum().backward()
262 µs ± 840 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

gpuで演算+微分（bmm）

%timeit torch.bmm(e.unsqueeze(1), f.unsqueeze(2)).squeeze(2).sum().backward()
336 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

ここからソースコードを追いかけたかったが、masterと大幅に違って追いかけるのが大変だったので諦めた

その他関連ページ

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

torch.bmmのスピード改善について

何をしたいか

結論

CPU

GPU

GPU(別記事)

環境

実行したスクリプト

コメント

その他関連ページ