More than 3 years have passed since last update.

Debian11.0 (bullseye) にGPU利用できる形でdockerを入れたときのメモ

Last updated at 2021-09-20Posted at 2021-08-20

まとめ

Debian11.0 (bullseye) でdockerに入れたpytorch(1.8.1+cu111)でGPU利用できるかためした
手持ちのRTX3060の場合、ホスト側のドライバを460.x系にしないとエラーが出て動かなかった
あといろいろエラーの対処が必要

ホスト環境

Debian 11.0 bullseye
Nvidia RTX 3060
- Driver Version: 465.31 -> 460.84 (あとから変更)
- CUDA Version: 11.3 -> 11.2 (あとから変更)

dockerのインストール

これやるだけ
- https://docs.docker.com/engine/install/debian/

gpu利用可能にするまで

デバイスを認識させるまで

起動：

$ docker run -it --rm --gpus all -v `pwd`:"/home/jovyan/work" -p 8888:8888 jupyter/scipy-notebook

このノートブックを動かす
- https://github.com/stockmarkteam/bert-book/blob/master/Chapter4.ipynb
ただし、scipy-notebookにはpytorchは入ってないのでちょっと修正：

!pip install transformers==4.5.0 fugashi==1.1.0 ipadic==1.0.0 torch==1.8.1

起動できなくて、このエラーがでてきたので対処した：

Could not select device driver "" with capabilities: [[gpu]].

nvidia-docker2を入れたら消えた
- 参考：https://medium.com/nvidiajapan/nvidia-docker-%E3%81%A3%E3%81%A6%E4%BB%8A%E3%81%A9%E3%81%86%E3%81%AA%E3%81%A3%E3%81%A6%E3%82%8B%E3%81%AE-20-09-%E7%89%88-558fae883f44
- ただし、debian11用はまだ入ってないので、debian10用を使った

distribution="debian10"
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

デバイスが見つからない問題の対処(どのタイミングでやったか忘れた)

エラー：

cgroup subsystem devices not found:

grubに設定を追加して対処
- 参考：https://github.com/NVIDIA/nvidia-docker/issues/1447#issuecomment-801479573

$ sudo nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet systemd.unified_cgroup_hierarchy=0"

$ update-grub
# shutdown -r now

nvidia-smiを呼び出せるようにするまで

起動：

$ docker run -it --rm --gpus all -v `pwd`:"/home/jovyan/work" -p 8888:8888 jupyter/scipy-notebook

bert.cuda()のところで怒られて、まだGPUが使えない:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

再度起動：

$ docker run -it --rm --gpus all -v `pwd`:"/home/jovyan/work" -p 8888:8888 -e GRANT_SUDO=yes --user root jupyter/scipy-notebook

jupyterのmagicで一番先頭に追加して確認:

参考: https://github.com/NVIDIA/nvidia-docker/issues/155

%%bash
sudo ldconfig
nvidia-smi

nvidia-smiはうまく行ったがまた別のエラー：

/opt/conda/lib/python3.9/site-packages/torch/cuda/__init__.py:104: UserWarning: 
NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

jupyterの色々入れてるところで、cuda11.1使ってるtorchを入れ直す：

!pip uninstall -y torch
!pip install transformers==4.5.0 fugashi==1.1.0 ipadic==1.0.0 torch==1.8.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html

ちょっと先のところでまたエラー：

RuntimeError: CUDA error: no kernel image is available for execution on the device

nightly buildの1.10.0を使う、pipじゃなくてcondaつかうとかやったけどだめ
- 参考: https://github.com/pytorch/pytorch/issues/49161
conda

!conda install -y pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge

ドライバーのバージョンを下げる

465.x系だと新しすぎるのかも、でもRTX3060は割と最近出たのでpytorchと一番互換性の高そうな455.x系がない
https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html
とりあえず、リカバリーモードでインストールして460.x系に落とした

Driver Version: 460.84       CUDA Version: 11.2

pip
結局これでインストールして、最後まで通ったよう

!pip install transformers==4.5.0 fugashi==1.1.0 ipadic==1.0.0 torch==1.8.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up