環境
・ubuntu 16.04
・python 3.8.3
・pytorch 1.7.0
pytorchでGPUが使えない
Deeplearningをしようと思ったが,遅いのでipythonでcudaが見えているか確認.
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: False
なぜ?
考えうる原因として,NVIDIA-driverが動作していないことを考えたが,
(base) user@user:~$ nvidia-smi
Tue Dec 3 11:29:10 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:01:00.0 On | N/A |
| 23% 34C P8 13W / 250W | 461MiB / 12192MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A |
| 23% 32C P8 9W / 250W | 2MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1313 G /usr/lib/xorg/Xorg 24MiB |
| 0 1357 G /usr/bin/gnome-shell 50MiB |
| 0 2168 G /usr/lib/xorg/Xorg 213MiB |
| 0 2305 G /usr/bin/gnome-shell 128MiB |
| 0 2735 G ...quest-channel-token=6733556068912485413 40MiB |
+-----------------------------------------------------------------------------+
動いている.
どうやら,原因は他にあるらしい.
色々調査を続けるとNvidia-driverが古いから動かないとの情報を発見.
実際にipythonで以下のように打つと,
In [1]: import torch
In [2]: torch.cuda.current_device()
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-2-3380d2c12118> in <module>()
----> 1 torch.cuda.current_device()
~/.local/lib/python3.7/site-packages/torch/cuda/__init__.py in current_device()
365 def current_device():
366 r"""Returns the index of a currently selected device."""
--> 367 _lazy_init()
368 return torch._C._cuda_getDevice()
369
~/.local/lib/python3.7/site-packages/torch/cuda/__init__.py in _lazy_init()
176 raise RuntimeError(
177 "Cannot re-initialize CUDA in forked subprocess. " + msg)
--> 178 _check_driver()
179 torch._C._cuda_init()
180 _cudart = _load_cudart()
~/.local/lib/python3.7/site-packages/torch/cuda/__init__.py in _check_driver()
106 Alternatively, go to: https://pytorch.org to install
107 a PyTorch version that has been compiled with your version
--> 108 of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
109
110
AssertionError:
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
というようなメッセージが出たので,Nvidia-driverを更新することで解決するはず.
NVIDIA-Driverの更新
参照にしたのは
CUDAをInstallする
Unable to install nvidia drivers
なお,VersionはCUDAとDriverの対応関係表から調べる必要がある.手元のCUDAのVersionによってはtorchが新しすぎる可能性もあるのでtorchの旧Versionから対応しているか確認する.
1. (必要なら)古いドライバーの削除
(base) user@user:~$ sudo apt-get --purge remove nvidia-*
(base) user@user:~$ sudo apt-get --purge remove cuda-*
2. Repositoryの登録
Ubuntuのドライバーを提供している[リポジトリ]
(https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa)を登録.
(base) user@user:~$ sudo add-apt-repository ppa:graphics-drivers/ppa
(base) user@user:~$ sudo apt-get update
3. ドライバのインストール
(base) user@user:~$ sudo apt install nvidia-driver-410
以下のような,エラーが出た場合.
E: Unable to locate package nvidia-381
このようなコマンドで解決.
(base) user@user:~$ sudo apt-get -o Dpkg::Options::="--force-overwrite" install --fix-broken
4. 再起動
(base) user@user:~$ sudo reboot
5. 確認
(base) user@user:~$ nvidia-smi
Tue Dec 3 14:01:41 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:01:00.0 On | N/A |
| 23% 37C P8 10W / 250W | 479MiB / 12192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) Off | 00000000:02:00.0 Off | N/A |
| 23% 34C P8 10W / 250W | 2MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1294 G /usr/lib/xorg/Xorg 24MiB |
| 0 1332 G /usr/bin/gnome-shell 50MiB |
| 0 1583 G /usr/lib/xorg/Xorg 232MiB |
| 0 1716 G /usr/bin/gnome-shell 95MiB |
| 0 2918 G ...uest-channel-token=17611144113784425579 72MiB |
+-----------------------------------------------------------------------------+
(base) user@user:~$ ipython
In [1]: import torch
In [2]: torch.cuda.is_available()
Out[2]: True