model.to(device)でGPUに転送したモデルが占有するメモリを開放する

Posted at 2024-08-06

はじめに

Pytorchでニューラルネットワークの学習を行う際には model.to(device) メソッドでGPUにモデルを転送する必要があります．単一のモデルを学習するだけならば，学習が終わり次第プログラムも終了し，GPUメモリは開放されます．しかし，複数のモデルを逐次に学習させたい場合，GPUメモリに転送したモデルを削除し，プログラム実行中にGPUメモリの開放をしたい場合があります．本記事では，GPUに転送したモデルを削除し，GPUメモリを開放する方法を説明します．

環境

Python 3.11
torch==2.3.0+cu118

基本的にGPUを使うことを前提としますが，GPUがない場合にもタスクマネージャーからPCのメモリ使用量を確認することでメモリが開放される様子を見ることができます．

方法

必要なライブラリをインポートし，使用するクラスや関数を記述します．

import torch
import torch.nn as nn
import subprocess

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(20000,20000)
        self.linear2 = nn.Linear(20000,20000)

def print_GPU_usage():
    result = subprocess.run(['nvidia-smi'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    output = result.stdout
    lines = output.split('\n')[8:10]

    for line in lines:
        print(line)

ニューラルネットワークモデルをインスタンス化します．
モデルをGPUに転送する前のGPUメモリ使用量は 1846MiB です．

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = MLP()

print("Before transfering the model")
print_GPU_usage()
# Before transfering the model
# |   0  NVIDIA GeForce GTX 1660 Ti   WDDM  | 00000000:0B:00.0  On |                  N/A |
# | 24%   41C    P0              36W / 120W |   1846MiB /  6144MiB |     29%      Default |

モデルをGPUに転送した後，GPUメモリ使用量を確認します．
GPUメモリ使用量は 4954 MiB に増加しました．

model.to(device)

print("The model transferred to GPU.")
print_GPU_usage()
# The model transferred to GPU.
# |   0  NVIDIA GeForce GTX 1660 Ti   WDDM  | 00000000:0B:00.0  On |                  N/A |
# | 25%   43C    P2              39W / 120W |   4954MiB /  6144MiB |     30%      Default |

モデルを削除し，GPUメモリ使用量を確認します．
モデルを削除してもGPUメモリを開放していないので，GPUメモリ使用量は 4954MiBのままです．

del model

print("The model was deleted.")
print_GPU_usage()
# The model was deleted.
# |   0  NVIDIA GeForce GTX 1660 Ti   WDDM  | 00000000:0B:00.0  On |                  N/A |
# | 24%   42C    P2              39W / 120W |   4954MiB /  6144MiB |     19%      Default |

GPUメモリを開放すると，GPUメモリ使用量は 1900MiB まで減少します．

torch.cuda.empty_cache()

print("Freed GPU memory.")
print_GPU_usage()
# Freed GPU memory.
# |   0  NVIDIA GeForce GTX 1660 Ti   WDDM  | 00000000:0B:00.0  On |                  N/A |
# | 24%   42C    P2              39W / 120W |   1900MiB /  6144MiB |     19%      Default |

以上の方法により，GPUメモリを開放することができます．
なお，model 自体を削除しても model に関連する値を変数で保持している場合には，その分のメモリは開放されません．

プログラムをgithub上にアップロードしているので，参考にしてください．
https://github.com/1taroh/free-GPU-memory/tree/main

結論

del modelして，torch.cuda.empty_cache()すればよいです．

参考文献

公式フォーラムを参考にしています．

Pytorch Cuda Free GPU Memory https://discuss.pytorch.org/t/pytorch-cuda-free-gpu-memory/74946
How to clear GPU memory after using model? https://discuss.pytorch.org/t/how-to-clear-gpu-memory-after-using-model/137408
Delete model from GPU/CPU https://discuss.pytorch.org/t/delete-model-from-gpu-cpu/123287
How to free the pytorch transformers model from GPU memory https://discuss.pytorch.org/t/how-to-free-the-pytorch-transformers-model-from-gpu-memory/132968

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up