0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Webアプリケーション開発のお勉強 #4

Posted at

#1.この記事の内容

Webアプリケーション開発のお勉強 #3まででフロントエンドサーバへのBootstrapの組込みまでを行いました.

本連載で実装予定のWebアプリケーションでは機械学習(DeepLearning含む)を実行できるようにする為,この記事では,Webアプリケーションを実行するバックエンドサーバでGPUを認識させる手順を記載します.
DeepLearningのフレームワークにはTensorFlowを使用することとします.

本記事の連載時点では,TensorFlowとPyTorchが主流ですが,分析容易性やエッジAIへの拡張性を加味して,TensorFlowを採用します.具体的な選定基準は,Tensor BoardがTensorFlowの分析ツールとして公開されている点と,TensorFlow Liteに対応するマイコンが市販されている点の2点です.これらに関する比較は下表の通りです.

フレームワーク 分析容易性 エッジAIへの拡張性
TensorFlow Tensor Board TensorFlow LiteTensorFlow Lite for Microcontrollers
PyTorch Tensor Board PyTorch Mobile

#2.GPU組み込み手順

WebアプリケーションサーバのベースイメージをnVIDIA NGCで公開されているTensorFlowイメージに変更します.
単にベースイメージを変更するだけではDjangoの機能と両立する為のパッケージが不足していますので,TensorFlowイメージに足りないパッケージのインストールも追記します.

$ cd <path/to/project>
$ vim docker-compose.yml
$ vim django_project/Dockerfile
$ vim django_project/entrypoint.sh
docker-compose.ymlの編集内容
@@ -12,7 +12,13 @@ services:
       - ./.env.dev
     depends_on:
       - db
-
+    deploy:
+      resources:
+        reservations:
+          devices:
+           - driver: nvidia
+             capabilities: [utility, compute, video]
+
   db:
     image: postgres:13.4-alpine
     volumes:
django_project/Dockerfileの編集内容
@@ -1,12 +1,14 @@
-FROM python:3.9.7-alpine as builder
+FROM nvcr.io/nvidia/tensorflow:21.03-tf2-py3 as builder

+RUN mkdir -p /home/app
+RUN groupadd app ; useradd app -g app
 WORKDIR /usr/src/app

 ENV PYTHONDONTWRITEBYTECODE 1
 ENV PYTHONUNBUFFERED 1

-RUN apk update \
-    && apk add postgresql-dev gcc python3-dev musl-dev
+RUN apt update ; \
+      apt install -y postgresql gcc python3-dev musl-dev

 RUN pip install --upgrade pip
 RUN pip install flake8
@@ -17,19 +19,19 @@ COPY ./requirements.txt .
 RUN pip wheel --no-cache-dir --no-deps --wheel-dir /usr/src/app/wheels -r requirements.txt


-FROM python:3.9.7-alpine
+FROM nvcr.io/nvidia/tensorflow:21.03-tf2-py3

 RUN mkdir -p /home/app

-RUN addgroup -S app && adduser -S app -G app
+RUN groupadd app ; useradd app -g app

 ENV HOME=/home/app
 ENV APP_HOME=/home/app/web
-RUN mkdir ${APP_HOME}
-RUN mkdir ${APP_HOME}/static
+RUN mkdir -p ${APP_HOME}
+RUN mkdir -p ${APP_HOME}/static
 WORKDIR ${APP_HOME}

-RUN apk update && apk add libpq
+RUN apt update ; apt -y install libpq-dev netcat
 COPY --from=builder /usr/src/app/wheels /wheels
 COPY --from=builder /usr/src/app/requirements.txt .
 RUN pip install --no-cache /wheels/*
django_project/entrypoint.shの編集内容
@@ -11,4 +11,5 @@ then
     echo "PostgreSQL started"
 fi

+/usr/local/bin/nvidia_entrypoint.sh
 exec "$@"

ここまでの対応で,サーバを起動し,xxx_web_1コンテナへログイン後,nvidia-smiやtensorflow.python.client.device_lib.list_local_devices()でGPUが認識されていることを確認できます.

nvidia-smi
$ docker exec -it test_web_1 /bin/bash
app@a762dd474640:~/web$ nvidia-smi
Sun Nov 14 00:53:11 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00       Driver Version: 510.06       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 24%   28C    P8    26W / 235W |   1198MiB /  8192MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
tensorflow.python.client.device_lib.list_local_devices()
app@a762dd474640:~/web$ python3
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
lib.list_local_devices()2021-11-14 00:53:32.427258: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> device_lib.list_local_devices()
2021-11-14 00:53:35.799926: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-11-14 00:53:35.801311: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-11-14 00:53:35.955063: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.955134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-11-14 00:53:35.955154: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-11-14 00:53:35.958504: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-11-14 00:53:35.958589: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-11-14 00:53:35.959266: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-11-14 00:53:35.959725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-11-14 00:53:35.960583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2021-11-14 00:53:35.961104: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-11-14 00:53:35.961220: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-11-14 00:53:35.961656: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.961954: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:35.961990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1888] Adding visible gpu devices: 0
2021-11-14 00:53:35.962040: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-11-14 00:53:36.684075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1287] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-14 00:53:36.684129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1293]      0
2021-11-14 00:53:36.684139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0:   N
2021-11-14 00:53:36.684652: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685012: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1515] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2021-11-14 00:53:36.685395: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1024] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2021-11-14 00:53:36.685455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/device:GPU:0 with 5958 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8186840601267269810
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6247874560
locality {
  bus_id: 1
  links {
  }
}
incarnation: 16877793378302607449
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5"
]
>>>

#3.さいごに

WebアプリケーションサーバにGPUを認識させる手順を記載しました.
TensorFlowでGPUを認識できていることも確認できましたので,学習プログラムを実装することでブラウザ操作でGPUを用いた学習を行える環境が構築できたこととなります.

本記事の執筆時点で,GitHubのコードで学習を実行可能な状態ではありますが,学習の中断ができず,使い勝手が悪い状態です(commit:ee7eb2f4a52f4909a80df8553a53fdcb494af8d4).

次回以降は課題が解決してから記事として整理することとなり,更新時期は未定となります.

#4.関連リンク

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?