More than 5 years have passed since last update.

PyTorchのGPU使用時のメモリ関連エラー

Posted at 2019-10-29

PyTorchでGPUを使っていてメモリについてのRuntime Errorが出たときの対処についてメモしておきます。
同じエラーに出くわし、このページが参考になる人がいれば幸いです。

Forループの中でPyTorchで定義されている変数を+=などすると、変数のメモリが解放されない

下のコード例はミニバッチ内でだしたlossを全体のloss_avgに加えて、後で平均を取ろうとしている。
コメントアウトした方のloss_avg += lossを使うと、これ以上メモリ割り当てができないというRuntime Errorがでるため、下のfloat(loss)のようにfloat変数に変換する。

loss_avg = 0.0
for batch_idx, (data, target) in enumerate(test_loader):
        data, target = torch.autograd.Variable(data.cuda()), torch.autograd.Variable(target.cuda())

        # forward
        output = net(data)
        loss = F.cross_entropy(output, target)

        # accuracy
        pred = output.data.max(1)[1]
        correct += pred.eq(target.data).sum()

        # test loss average
        #loss_avg += loss
        loss_avg += float(loss)

参考ページ
https://pytorch.org/docs/stable/notes/faq.html

Docker container起動時

PyTorchのGPUを使うDocker containerを起動するときには、 --ipc=host か --shm-size=16G のオプションをつけてメモリ割当をしておく必要がある。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up