More than 3 years have passed since last update.

Docker Desktop for Windows + WSL2 で GPUありの機械学習環境を最速で作る

Posted at 2021-05-28

はじめに

機械学習で面倒なのは CUDA や Python周りの環境づくりですが、Dockerの普及でかなり楽になりました。

今回、Docker Desktop for Windows と WSL2 で Docker コンテナから GPU を利用することを試みました。

途中、原因不明のトラブルが発生し、紆余曲折がありました（下記ツイート参照）。
ようやく、期待する環境構築ができたので本記事にてまとめます。

Docker Desktop for Windows + WSL2 + NVIDIA Drivers for CUDA on WSL (470.14) で GPU を使おうとしたら、OCI runtime create failed というエラーが出る。

既に数時間格闘中だけど原因不明。ヤバイ、時間だけが過ぎていく･･･。

どなたか同様のエラーに遭遇している人はいませんか？
— Masahiro Tatsumi (@masatatsu8) May 26, 2021

結論

記事執筆現在 (2021/5/28)において、Windows 上で Docker コンテナからGPUを使う際のポイントは以下の通りです。

Windows Insider Program の Devチャンネルで公開されている Windows 10 にする
NVIDIA Driver (NVIDIA CUDA on WSL Public Preview) をインストール
WSL2 の有効化
Docker Desktop for Windows ver. 3.3.0 (最新版ではない！）をインストール
cuda/nvidia の Dockerfile からビルド

環境構築の手順

全体的な流れは下記の記事を参照しました。

ただし、上記の記事が書かれた状況と、本記事執筆時では各種ソフトウェア/ドライバのバージョンが異なるため、単純に最新版をインストールすると不具合が発生します。

以下に、具体的なバージョンとポイントを示します。

Windows Insider Program への参加

WSL2 を利用するために、Windows Insider Program の Dev チャンネルで公開されている、最新版の Windows10 を適用します。私がインストールしたものはビルド番号が 21387.1 のものでした。

NVIDIA Driver のインストール

WSL で GPU を利用するには、NVIDIA のサイトで公開されている Windows 用の CUDA Driver が必要となります。ダウンロードには Developer Zone に入る必要があります。（無料で登録できます）

私がインストールしたバージョンは 470.14 でした。

WSL2 の有効化

WSL2 を有効化するには、管理者権限でコマンドプロンプトを起動し、以下を実行します。
最新版の Ubuntu-20.04 をインストールする際は、-d 以下は不要です。

wsl.exe --install -d Ubuntu-18.04

バージョン2が有効になっていることを確認します。

C:\Users\hoge>wsl --status
既定の配布: Ubuntu-18.04
既定のバージョン: 2

Linux 用 Windows サブシステムの最終更新日: 2021/05/26
WSL の自動更新が有効になっています。

カーネル バージョン: 5.10.16

Docker Desktop for Windows のインストール

ここが最大のハマりポイントでした。
最新版である 3.3.3 ではコンテナ起動時にエラーが発生してしまうため、これを回避するために、3.3.0 をインストールします。

上記の解決策については、以下の情報を参考にしました。

Docker イメージの作成

NGC に登録されている TensorFlow2 や Pytorch の Docker イメージを pull して使おうとしたら、なぜか NVIDIA ドライバが認識されず、結果としてGPUの利用ができない問題が発生しました。

そこで、nvidia/cudaコンテナを参考に、以下の Dockerfileを用いて、自分でイメージをビルドしました。

FROM nvidia/cuda:11.3.0-devel-ubuntu20.04
LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

ENV CUDNN_VERSION 8.2.0.53

LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends \
    libcudnn8=$CUDNN_VERSION-1+cuda11.3 \
    libcudnn8-dev=$CUDNN_VERSION-1+cuda11.3 \
    && apt-mark hold libcudnn8 && \
    rm -rf /var/lib/apt/lists/*

RUN apt update
RUN apt install -y python3-pip
RUN pip install --upgrade pip
RUN pip install tensorflow

docker build -t hoge/cuda:11.3.0-cudnn8-ubuntu20.04-tf2-py3 .

Docker から GPU が利用できることを確認

下記により Docker コンテナを起動します。

docker run -it --rm --gpus all hoge/cuda:11.3.0-cudnn8-ubuntu20.04-tf2-py3

GPU が認識できているか、ensorflow2 を用いて、以下の２行で確認します。

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Python3 を起動して、プロンプトにおいて上記を実行すると、GPUが認識できていることが確認できました。

root@09e7ef2cdbc4:/# python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2021-05-28 03:15:27.648707: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> device_lib.list_local_devices()
2021-05-28 03:15:31.431094: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-28 03:15:31.442788: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-28 03:15:31.812429: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-05-28 03:15:31.812766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3070 Laptop GPU computeCapability: 8.6
coreClock: 1.62GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-05-28 03:15:31.812837: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-28 03:15:31.827289: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-05-28 03:15:31.827367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-05-28 03:15:31.834853: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-05-28 03:15:31.837438: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-05-28 03:15:31.840207: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-05-28 03:15:31.846492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-05-28 03:15:31.846630: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-05-28 03:15:31.847068: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-05-28 03:15:31.847615: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-05-28 03:15:31.848162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-05-28 03:15:31.848492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-28 03:15:32.955840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-28 03:15:32.955883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2021-05-28 03:15:32.955936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2021-05-28 03:15:32.956845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-05-28 03:15:32.957639: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-05-28 03:15:32.957914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1501] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-05-28 03:15:32.958498: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:923] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-05-28 03:15:32.958815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 5454 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 578308800421540296
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 5719326720
locality {
  bus_id: 1
  links {
  }
}
incarnation: 17531962265323874837
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 3070 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6"
]
>>>

まとめ

以上、Docker Desktop for Windows + WSL2 で GPUありの機械学習環境を最速で作る方法についてまとめました。

Docker を活用することで、CUDA バージョンについての悩むことなく、機械学習環境を簡単に作ることができます。

本記事の執筆時点において、Docker Desktop for Windows のバージョンを3.3.0とする必要がある点にだけ注意すれば、簡単に環境を構築することができるでしょう。

なお、本日 Windows 10 ビルド21390.1が配信されていましたので追試したところ、

3.3.0 動作OK
3.3.3 動作NG

でした。

当面の間、Docker Desktop for Windows は 3.3.0 から上げない方がよいでしょう。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up