More than 3 years have passed since last update.

TensorFlow Serving を使用して TensorFlow のディープラーニングモデルをホストしてみた

Posted at 2020-07-17

はじめに

TensorFlow Servingは、本番環境向けに設計された柔軟で高性能な機械学習モデル用サービングシステムです。TensorFlow Servingを使用すると簡単にTensorFlowで作成したモデルをホストし、APIを公開することが出来ます。

詳細はTensorFlow Servingのドキュメントを参照してください。

今回はAWS EC2上でTensorFlow Serving を使用して TensorFlow のディープラーニングモデルをホストしてみました。記事の最後にはDockerでも試しています。

手順

EC2インスタンス作成

AMIの検索バーに「Deep Learning AMI」と入力し、使用するAMIを検索します。今回は「Deep Learning AMI (Ubuntu 18.04) Version 30.0 - ami-0b1b56cbf0f8fcea3」を使用しました。インスタンスタイプは「p2.xlarge」を使用しました。セキュリティグループは開発環境からssh、httpを接続できるように設定し、他の設定はすべてデフォルトのままにしています。

環境構築

EC2にログインして環境を構築します。

~$ ls
LICENSE                README     examples  tools
Nvidia_Cloud_EULA.pdf  anaconda3  src       tutorials

インストール手順は公式サイトで紹介されています。

まず、TensorFlow Serving の URI を sources.list.d に加えます。

~$ echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0  18166      0 --:--:-- --:--:-- --:--:-- 18166
OK

インストールを実行します。

~$ sudo apt-get update && apt-get install tensorflow-model-server
~$ tensorflow_model_server --version
TensorFlow ModelServer: 1.15.0-rc2+dev.sha.1ab7d59
TensorFlow Library: 1.15.2

これでインストールは完了です。

モデル構築

ここからはデプロイするモデルを作っていきます。まず作業ディレクトリを用意します。

~$ mkdir tfexample
~$ cd tfexample

jupyter-labを起動してモデルを構築していきます。
※下記コマンドは全てのipに対して開放するので、セキュリティグループを使ってアクセス範囲を開発環境に絞ってください。

~/tfexample$ jupyter-lab --no-browser --port=8888 --ip=0.0.0.0 --allow-root

...
http://127.0.0.1:8888/?token=b92a7ceefb20c7ab3e475474dbde66a771870de1d8f5bd70
...

標準出力にURLが表示されている箇所があるので、127.0.0.1の部分をインスタンスのipアドレスに書き換えてアクセスします。

jupyer labが起動したら、conda_tensorflow2_py36のカーネルを選択してnotebookを開きます。tfmodel.ipynbにリネームしておきます。

今回はFashonmnistでモデルを作ります。

tfmodel.ipynb

import sys
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import os
import tempfile

print('TensorFlow version: {}'.format(tf.__version__))
# TensorFlow version: 2.1.0

tfmodel.ipynb

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# scale the values to 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0

# reshape for feeding into the model
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype))
print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype))
# train_images.shape: (60000, 28, 28, 1), of float64
# test_images.shape: (10000, 28, 28, 1), of float64

tfmodel.ipynb

model = keras.Sequential([
  keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3, 
                      strides=2, activation='relu', name='Conv1'),
  keras.layers.Flatten(),
  keras.layers.Dense(10, activation=tf.nn.softmax, name='Softmax')
])
model.summary()

testing = False
epochs = 5

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=epochs)

test_loss, test_acc = model.evaluate(test_images, test_labels)
print('\nTest accuracy: {}'.format(test_acc))

# Model: "sequential"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# Conv1 (Conv2D)               (None, 13, 13, 8)         80        
# _________________________________________________________________
# flatten (Flatten)            (None, 1352)              0         
# _________________________________________________________________
# Softmax (Dense)              (None, 10)                13530     
# =================================================================
# Total params: 13,610
# Trainable params: 13,610
# Non-trainable params: 0
# _________________________________________________________________
# Train on 60000 samples
# Epoch 1/5
# 60000/60000 [==============================] - 46s 770us/sample - loss: 0.5398 - accuracy: 0.8182
# Epoch 2/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3849 - accuracy: 0.8643
# Epoch 3/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3513 - accuracy: 0.8751
# Epoch 4/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3329 - accuracy: 0.8820
# Epoch 5/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3204 - accuracy: 0.8847
# 10000/10000 [==============================] - 1s 78us/sample - loss: 0.3475 - accuracy: 0.8779

# Test accuracy: 0.8779000043869019

tfmodel.ipynb

MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))

tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

print('\nSaved model:')
!ls -l {export_path}

# export_path = /tmp/1

# WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
# Instructions for updating:
# If using Keras pass *_constraint arguments to layers.
# INFO:tensorflow:Assets written to: /tmp/1/assets

# Saved model:
# total 84
# drwxr-xr-x 2 ubuntu ubuntu  4096 Jul 17 10:49 assets
# -rw-rw-r-- 1 ubuntu ubuntu 74970 Jul 17 10:49 saved_model.pb
# drwxr-xr-x 2 ubuntu ubuntu  4096 Jul 17 10:49 variables

モデルの保存先はtempfileモジュールで作成しました。今回モデルは/tmp/1に保存されています。

モデルのホスト

別のターミナルを開いて、インスタンスにログインし、サーバーを起動します。

~$ export MODEL_DIR=/tmp
~$ tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=fashion_model \
  --model_base_path="${MODEL_DIR}"

model_base_pathの下にバージョンを示すディレクトリがあり、その下にモデルが保存されている、という構造になっている必要があるようです。

model_base_path/
　├ 1/
　│　├ assets/
　│　├ variables/
　│　└ saved_model.pb
　├ 2/
　│　├ （以下略）

リクエストを投げて確かめてみます。notebookに戻り、リクエストを作成します。

tfmodel.ipynb

def show(idx, title):
    plt.figure()
    plt.imshow(test_images[idx].reshape(28,28), cmap = "gray")
    plt.axis('off')
    plt.title('\n\n{}'.format(title), fontdict={'size': 16})

tfmodel.ipynb

import json

data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
# Data: {"signature_name": "serving_default", "instances": ...  [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}

tfmodel.ipynb

import requests

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/fashion_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

show(0, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
  class_names[np.argmax(predictions[0])], np.argmax(predictions[0]), class_names[test_labels[0]], test_labels[0]))

※エラーが出る場合はサーバの再起動やCUDAの設定をやり直してみてください。

POSTでjson形式でデータを送信します。instancesキーに対しデータをセットしますが、バッチで予測するので、shapeに気をつける必要がありあます。

ちなみにpredictionsの中身は次のような感じです。

predictions[0]

# [7.71279588e-07,
#  4.52205953e-08,
#  5.55571035e-07,
#  1.59779923e-08,
#  2.27421737e-07,
#  0.00600787532,
#  8.29056205e-07,
#  0.0466650613,
#  0.00145569211,
#  0.945868969]

各クラスに対する確率がリストに格納されています。これは次のコードと同じ出力です。

model.predict(test_images[0:3]).tolist()[0]

dockerでのホスト

~$ docker --version
Docker version 19.03.11, build 42e35e61f3

~$ docker pull tensorflow/serving
~$ docker run -d -t --rm -p 8501:8501 -v "/tmp:/models/fashion_model" -e MODEL_NAME=fashion_model tensorflow/serving

エントリポイントは下記のようになっています。RESTful APIのポートが8501、gRPCのポートが8500、model_base_pathが${MODEL_BASE_PATH}/${MODEL_NAME}になっています。

tensorflow_model_server --port=8500 --rest_api_port=8501 \
  --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}

エントリポイントのファイルは/usr/bin/tf_serving_entrypoint.shに格納してあり、実際には次のようなコードが入っています。

# !/bin/bash 

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

したがって、dockerを使用する場合はホストのモデル格納パスをdockerのmodel_base_pathにマウントすればよいだけです。

その他メモ

gRPCインターフェースもサポート。
モデルのパス、最大バッチサイズ、スレッド数、タイムアウトはコンフィグファイルで指定可能。
Signatureという、モデルのインプット、アウトプット形式もカスタマイズできるようです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up