#1.この記事の内容
Webアプリケーション開発のお勉強 #3まででフロントエンドサーバへのBootstrapの組込みまでを行いました.
本連載で実装予定のWebアプリケーションでは機械学習(DeepLearning含む)を実行できるようにする為,この記事では,Webアプリケーションを実行するバックエンドサーバでGPUを認識させる手順を記載します.
DeepLearningのフレームワークにはTensorFlowを使用することとします.
本記事の連載時点では,TensorFlowとPyTorchが主流ですが,分析容易性やエッジAIへの拡張性を加味して,TensorFlowを採用します.具体的な選定基準は,Tensor BoardがTensorFlowの分析ツールとして公開されている点と,TensorFlow Liteに対応するマイコンが市販されている点の2点です.これらに関する比較は下表の通りです.
フレームワーク | 分析容易性 | エッジAIへの拡張性 |
---|---|---|
TensorFlow | Tensor Board | TensorFlow Lite,TensorFlow Lite for Microcontrollers |
PyTorch | Tensor Board | PyTorch Mobile |
#2.GPU組み込み手順
WebアプリケーションサーバのベースイメージをnVIDIA NGCで公開されているTensorFlowイメージに変更します.
単にベースイメージを変更するだけではDjangoの機能と両立する為のパッケージが不足していますので,TensorFlowイメージに足りないパッケージのインストールも追記します.
$ cd <path/to/project>
$ vim docker-compose.yml
$ vim django_project/Dockerfile
$ vim django_project/entrypoint.sh
docker-compose.ymlの編集内容
@@ -12,7 +12,13 @@ services:
- ./.env.dev
depends_on:
- db
-
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ capabilities: [utility, compute, video]
+
db:
image: postgres:13.4-alpine
volumes:
django_project/Dockerfileの編集内容
@@ -1,12 +1,14 @@
-FROM python:3.9.7-alpine as builder
+FROM nvcr.io/nvidia/tensorflow:21.03-tf2-py3 as builder
+RUN mkdir -p /home/app
+RUN groupadd app ; useradd app -g app
WORKDIR /usr/src/app
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
-RUN apk update \
- && apk add postgresql-dev gcc python3-dev musl-dev
+RUN apt update ; \
+ apt install -y postgresql gcc python3-dev musl-dev
RUN pip install --upgrade pip
RUN pip install flake8
@@ -17,19 +19,19 @@ COPY ./requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /usr/src/app/wheels -r requirements.txt
-FROM python:3.9.7-alpine
+FROM nvcr.io/nvidia/tensorflow:21.03-tf2-py3
RUN mkdir -p /home/app
-RUN addgroup -S app && adduser -S app -G app
+RUN groupadd app ; useradd app -g app
ENV HOME=/home/app
ENV APP_HOME=/home/app/web
-RUN mkdir ${APP_HOME}
-RUN mkdir ${APP_HOME}/static
+RUN mkdir -p ${APP_HOME}
+RUN mkdir -p ${APP_HOME}/static
WORKDIR ${APP_HOME}
-RUN apk update && apk add libpq
+RUN apt update ; apt -y install libpq-dev netcat
COPY --from=builder /usr/src/app/wheels /wheels
COPY --from=builder /usr/src/app/requirements.txt .
RUN pip install --no-cache /wheels/*
django_project/entrypoint.shの編集内容
@@ -11,4 +11,5 @@ then
echo "PostgreSQL started"
fi
+/usr/local/bin/nvidia_entrypoint.sh
exec "$@"
ここまでの対応で,サーバを起動し,xxx_web_1コンテナへログイン後,nvidia-smiやtensorflow.python.client.device_lib.list_local_devices()でGPUが認識されていることを確認できます.
nvidia-smi
$ docker exec -it test_web_1 /bin/bash
app@a762dd474640:~/web$ nvidia-smi
Sun Nov 14 00:53:11 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00 Driver Version: 510.06 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 24% 28C P8 26W / 235W | 1198MiB / 8192MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
tensorflow.python.client.device_lib.list_local_devices()
app@a762dd474640:~/web$ python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
lib.list_local_devices()2021-11-14 00:53:32.427258: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> device_lib.list_local_devices()
2021-11-14 00:53:35.799926: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-11-14 00:53:35.801311: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-11-14 00:53:35.955063: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.955134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-11-14 00:53:35.955154: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-11-14 00:53:35.958504: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-11-14 00:53:35.958589: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-11-14 00:53:35.959266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-11-14 00:53:35.959725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-11-14 00:53:35.960583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-11-14 00:53:35.961104: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-11-14 00:53:35.961220: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-11-14 00:53:35.961656: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.961954: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.961990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-11-14 00:53:35.962040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-11-14 00:53:36.684075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-14 00:53:36.684129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293] 0
2021-11-14 00:53:36.684139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0: N
2021-11-14 00:53:36.684652: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685012: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1515] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-11-14 00:53:36.685395: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/device:GPU:0 with 5958 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8186840601267269810
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6247874560
locality {
bus_id: 1
links {
}
}
incarnation: 16877793378302607449
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
>>>
#3.さいごに
WebアプリケーションサーバにGPUを認識させる手順を記載しました.
TensorFlowでGPUを認識できていることも確認できましたので,学習プログラムを実装することでブラウザ操作でGPUを用いた学習を行える環境が構築できたこととなります.
本記事の執筆時点で,GitHubのコードで学習を実行可能な状態ではありますが,学習の中断ができず,使い勝手が悪い状態です(commit:ee7eb2f4a52f4909a80df8553a53fdcb494af8d4).
次回以降は課題が解決してから記事として整理することとなり,更新時期は未定となります.
#4.関連リンク