More than 3 years have passed since last update.

Webアプリケーション開発のお勉強 #4

Posted at 2021-11-14

1.この記事の内容

Webアプリケーション開発のお勉強 #3まででフロントエンドサーバへのBootstrapの組込みまでを行いました．

本連載で実装予定のWebアプリケーションでは機械学習（DeepLearning含む）を実行できるようにする為，この記事では，Webアプリケーションを実行するバックエンドサーバでGPUを認識させる手順を記載します．
DeepLearningのフレームワークにはTensorFlowを使用することとします．

本記事の連載時点では，TensorFlowとPyTorchが主流ですが，分析容易性やエッジAIへの拡張性を加味して，TensorFlowを採用します．具体的な選定基準は，Tensor BoardがTensorFlowの分析ツールとして公開されている点と，TensorFlow Liteに対応するマイコンが市販されている点の2点です．これらに関する比較は下表の通りです．

フレームワーク	分析容易性	エッジAIへの拡張性
TensorFlow	Tensor Board	TensorFlow Lite，TensorFlow Lite for Microcontrollers
PyTorch	Tensor Board	PyTorch Mobile

2.GPU組み込み手順

WebアプリケーションサーバのベースイメージをnVIDIA NGCで公開されているTensorFlowイメージに変更します．
単にベースイメージを変更するだけではDjangoの機能と両立する為のパッケージが不足していますので，TensorFlowイメージに足りないパッケージのインストールも追記します．

$ cd <path/to/project>
$ vim docker-compose.yml
$ vim django_project/Dockerfile
$ vim django_project/entrypoint.sh

docker-compose.ymlの編集内容

@@ -12,7 +12,13 @@ services:
       - ./.env.dev
     depends_on:
       - db
-
+    deploy:
+      resources:
+        reservations:
+          devices:
+           - driver: nvidia
+             capabilities: [utility, compute, video]
+
   db:
     image: postgres:13.4-alpine
     volumes:

django_project/Dockerfileの編集内容

@@ -1,12 +1,14 @@
-FROM python:3.9.7-alpine as builder
+FROM nvcr.io/nvidia/tensorflow:21.03-tf2-py3 as builder

+RUN mkdir -p /home/app
+RUN groupadd app ; useradd app -g app
 WORKDIR /usr/src/app

 ENV PYTHONDONTWRITEBYTECODE 1
 ENV PYTHONUNBUFFERED 1

-RUN apk update \
-    && apk add postgresql-dev gcc python3-dev musl-dev
+RUN apt update ; \
+      apt install -y postgresql gcc python3-dev musl-dev

 RUN pip install --upgrade pip
 RUN pip install flake8
@@ -17,19 +19,19 @@ COPY ./requirements.txt .
 RUN pip wheel --no-cache-dir --no-deps --wheel-dir /usr/src/app/wheels -r requirements.txt


-FROM python:3.9.7-alpine
+FROM nvcr.io/nvidia/tensorflow:21.03-tf2-py3

 RUN mkdir -p /home/app

-RUN addgroup -S app && adduser -S app -G app
+RUN groupadd app ; useradd app -g app

 ENV HOME=/home/app
 ENV APP_HOME=/home/app/web
-RUN mkdir ${APP_HOME}
-RUN mkdir ${APP_HOME}/static
+RUN mkdir -p ${APP_HOME}
+RUN mkdir -p ${APP_HOME}/static
 WORKDIR ${APP_HOME}

-RUN apk update && apk add libpq
+RUN apt update ; apt -y install libpq-dev netcat
 COPY --from=builder /usr/src/app/wheels /wheels
 COPY --from=builder /usr/src/app/requirements.txt .
 RUN pip install --no-cache /wheels/*

django_project/entrypoint.shの編集内容

@@ -11,4 +11,5 @@ then
     echo "PostgreSQL started"
 fi

+/usr/local/bin/nvidia_entrypoint.sh
 exec "$@"

ここまでの対応で，サーバを起動し，xxx_web_1コンテナへログイン後，nvidia-smiやtensorflow.python.client.device_lib.list_local_devices()でGPUが認識されていることを確認できます．

nvidia-smi

$ docker exec -it test_web_1 /bin/bash
app@a762dd474640:~/web$ nvidia-smi
Sun Nov 14 00:53:11 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00       Driver Version: 510.06       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 24%   28C    P8    26W / 235W |   1198MiB /  8192MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

tensorflow.python.client.device_lib.list_local_devices()

app@a762dd474640:~/web$ python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
lib.list_local_devices()2021-11-14 00:53:32.427258: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> device_lib.list_local_devices()
2021-11-14 00:53:35.799926: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-11-14 00:53:35.801311: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-11-14 00:53:35.955063: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.955134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-11-14 00:53:35.955154: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-11-14 00:53:35.958504: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-11-14 00:53:35.958589: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-11-14 00:53:35.959266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-11-14 00:53:35.959725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-11-14 00:53:35.960583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-11-14 00:53:35.961104: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-11-14 00:53:35.961220: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-11-14 00:53:35.961656: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.961954: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.961990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-11-14 00:53:35.962040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-11-14 00:53:36.684075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-14 00:53:36.684129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293]      0
2021-11-14 00:53:36.684139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0:   N
2021-11-14 00:53:36.684652: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685012: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1515] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-11-14 00:53:36.685395: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/device:GPU:0 with 5958 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8186840601267269810
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6247874560
locality {
  bus_id: 1
  links {
  }
}
incarnation: 16877793378302607449
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
>>>

3.さいごに

WebアプリケーションサーバにGPUを認識させる手順を記載しました．
TensorFlowでGPUを認識できていることも確認できましたので，学習プログラムを実装することでブラウザ操作でGPUを用いた学習を行える環境が構築できたこととなります．

本記事の執筆時点で，GitHubのコードで学習を実行可能な状態ではありますが，学習の中断ができず，使い勝手が悪い状態です(commit:ee7eb2f4a52f4909a80df8553a53fdcb494af8d4)．

次回以降は課題が解決してから記事として整理することとなり，更新時期は未定となります．

4.関連リンク

Webアプリケーション開発のお勉強　目次

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up