More than 3 years have passed since last update.

StreamlitからDatabricksでサービングしている機械学習モデルを呼び出す

Last updated at 2022-03-07Posted at 2022-03-04

本書は、以下の記事で説明されているStreamlitアプリケーションを動作させる際の手順を説明するものです。この方法を用いることで、DatabricksでサービングしているモデルをGUIから呼び出すことが簡単にできるようになります。

本書で説明するアプリケーションはデモを目的としたものです。プロダクション環境で利用される際には、適切なセキュリティ対策を講じるようにしてください。

モデルサービング

上の記事からリンクされているノートブックを実行し、類似画像レコメンデーションのモデルを構築します。Cmd79まで実行しモデルをMLflowで記録します。そして、記録されたモデルをモデルレジストリに登録します。今回の例ではsim_model_takaという名前でモデルを登録しています。モデルレジストリの詳細については、DatabricksにおけるMLflowモデルレジストリをご覧ください。

モデルレジストリに登録すると以下のような状態になります。こちらの画面にはサイドバーのペルソナスイッチャーでMachine Leaningを選択し、Models > sim_model_takaをクリックすることでアクセスできます。

Servingタブをクリックし、Enable Servingをクリックします。これでモデルサービング用のクラスターが起動し、モデルがデプロイされます。モデルサービングの詳細に関しては、Databricksにおけるモデルサービングをご覧ください。

Status(モデルサービング用クラスターのステータス)がPendingからRunningになり、Model Versionsに表示されるモデルバージョン(デプロイされる機械学習モデル)もPendingからRunningになるまで待ちます。

両方Runningになったら、画面上のModel URLをコピーしておきます。後のステップで使用します。

モデルサービング用のクラスターを稼働させている際にも課金が発生します。モデルを使用しない場合には、Statusの右にあるStopをクリックして、モデルサービング用のクラスターを停止してください。

パーソナルアクセストークンの取得

モデルサービングされているモデルを呼び出す際には、Databricksのパーソナルアクセストークンが必要となります。

サイドメニューの設定 > ユーザー設定 > アクセストークンに移動し、新規トークンの作成をクリックして、トークンの名前、存続期間を指定します。作成をクリックするとトークンが表示されるのでコピーしておきます。

注意パーソナルアクセストークンは厳重に管理してください。第三者に教えたりしないでください。

Streamlitアプリケーションの設定

以下で公開されているファイルをローカルにダウンロードします。このサンプルアプリケーションはDockerで稼働するので、ローカルにDockerがインストールされていることを確認してください。インストールされていない場合にはインストールしてください。

上のリポジトリに含まれているapp.pyではエラーになりますので、以下のapp.pyに差し替えてください。<Model URL>には上でコピーしたものを貼り付けてください。

recommender_app/app.py

import streamlit as st 
import numpy as np 
from PIL import Image
import base64
import io

import os
import requests
import numpy as np
import pandas as pd

st.header('類似画像レコメンデーションシステム')
st.write('[eコマース向け類似画像レコメンデーションシステムの構築 \- Qiita](https://qiita.com/taka_yayoi/items/173d5228c1d08c5130ef)') # markdown

# Copy and paste this code from the MLflow real-time inference UI. Make sure to save Bearer token from 
def create_tf_serving_json(data):
  return {'inputs': {name: data[name].tolist() for name in data.keys()} if isinstance(data, dict) else data.tolist()}

def score_model(dataset):
  token = os.environ.get("DATABRICKS_TOKEN")
  url = '<Model URL>'
  headers = {'Authorization': f'Bearer {os.environ.get("DATABRICKS_TOKEN")}'}
  
  data_json = dataset.to_dict(orient='split') if isinstance(dataset, pd.DataFrame) else create_tf_serving_json(dataset)
  response = requests.request(method='POST', headers=headers, url=url, json=data_json)
  if response.status_code != 200:
    raise Exception(f'Request failed with status {response.status_code}, {response.text}')
  return response.json()

def render_response_image(i):
  """response is the returned JSON object. We can loop through this object and return the reshaped numpy array for each recommended image which can then be rendered"""

  single_image_string = response[i]["0"]
  image_array = np.frombuffer(bytes.fromhex(single_image_string), dtype=np.float32)
  image_reshaped = np.reshape(image_array, (28,28))
  st.image(image_reshaped, width=100)
  st.image("images/heading.png")

# Source: https://discuss.streamlit.io/t/png-bytes-io-numpy-conversion-using-file-uploader/1409
img_file_buffer = st.file_uploader('Upload a PNG image', type='png')

if img_file_buffer is not None:
  image = Image.open(img_file_buffer)
  img_array = np.array(image)
  st.write("Uploaded Image: ")
  st.image(image, width = 100)
  # To convert to a string based IO:
  byteIO = io.BytesIO()
  image.save(byteIO, format='PNG')
  img_str = base64.b64encode(byteIO.getvalue())

  model_input  =  img_str.decode("utf-8")

  df = pd.DataFrame.from_dict({"input": [model_input] })

  response = score_model(df) 

  st.write("Recommended Items")

  col0, col1, col2, col3, col4 = st.columns(5)

  with col0:
    render_response_image(0)
  with col1:
    render_response_image(1)
  with col2:
    render_response_image(2)
  with col3:
    render_response_image(3)
  with col4:
    render_response_image(4)

以下ではDockerでアプリケーションを稼働させていますが、Share Streamlitでアプリケーションをホスティングすることも可能です。

Dockerイメージのビルド・実行

本例では、上記パーソナルアクセストークンを環境変数としてDockerに引き渡します。以下の記事を参考にさせていただきました。

Bash

export DATABRICKS_TOKEN=<パーソナルアクセストークン>

cd recommender_app
docker build -f Dockerfile -t app:latest . 
docker run -p 8501:8501 app:latest --env DATABRICKS_TOKEN

以下のようなメッセージが表示されたら、ブラウザでhttp://localhost:8501/にアクセスします。

このように画像をアップロードするGUIが表示されます。

アップロードする画像は28x28ピクセルである必要があります。

なお、こちらから画像を取得することができます。

画像をアップロードすると類似している画像が表示されます。このように、お手軽に機械学習モデルを活用できるようになります。是非ご活用ください。

しかし、Streamlitいいですね。お手軽にWebアプリが組めて。もっと勉強しようと思います。以下の記事を参考にさせていただきました。

Databricks 無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up