ラクスAdvent Calendar 2024

Amazon SageMaker で Hugging Face からモデルを選んで使ってみる

Last updated at 2024-12-16Posted at 2024-12-09

この記事は、ラクス Advent Calendar 2024の10日目の記事です。
昨日は@h_nki さんの記事でした。

こんにちは、社会人7年目の a2c_sugar です。
もともとAI/LLM・機械学習には興味があったのですが、半年くらい前から本格的に学習を始めました。
今回は、最近勉強中の Amazon SageMaker について記事にまとめてみました。
Hugging Face に公開されているモデルを SageMaker 上でホストする手順を紹介します。

はじめに

Amazon SageMaker とは

Amazon SageMaker は、機械学習モデルの構築、トレーニング、デプロイ、チューニング、パイプライン構築など、機械学習のライフサイクル全般をサポートするフルマネージド型サービスです。自動化された機能や豊富なツールセットを活用して効率的に作業を進めることができます。

以下にチュートリアルがまとまっているので、興味がある方は見て実際に手を動かしていただけると理解が深まるかと思います！

Hugging Face とは

Hugging Face は、機械学習・自然言語処理モデルが数多く公開されているプラットフォームで、AI向けの「GitHub」のような存在です。BERT などの有名なオープンソースモデルから、近年注目を集める大規模言語モデル (LLM) まで、さまざまなモデルが公開・共有されています。また、Transformersライブラリを通じて簡単にモデルを利用・カスタマイズできる点も魅力です。

SageMaker は Hugging Face に公開されているモデルを簡単にデプロイできる仕組みを提供しています。次章では、その方法を具体的に見ていきます。

早速使ってみる

今回は、例として「DistilBERT」をベースにした質問応答 (Question Answering) モデルをホストしてみます。

SageMaker Studio の JupyterLabにてpythonコードを実行していきます。

モデルの作成

まずはインポートやセッション、ロールの取得を行います。

# インポート
import sagemaker
from sagemaker import get_execution_role
from sagemaker.huggingface import HuggingFaceModel

# SageMakerセッションの作成
sagemaker_session = sagemaker.Session()

# ロールの取得
role = get_execution_role()

Hugging Face に公開されている model_id を指定することで、そのモデルを簡単に SageMaker 上でデプロイできます。
transformers_version、pytorch_version、py_version を指定することで、SageMaker 向けの公式イメージを自動的に取得し構築が可能です。

# Model config 情報
hub = {
  'HF_MODEL_ID': 'distilbert-base-uncased-distilled-squad', # 利用するモデルID
  'HF_TASK': 'question-answering' # 質問に回答するタスク
}

# Hugging Face Model Classの作成
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.26", # Transformersのバージョン
    pytorch_version="1.13",     # PyTorchのバージョン
    py_version="py39",          # Pythonのバージョン
)

カスタムイメージの利用例

モデルによっては、追加のライブラリインストールや環境構築が必要な場合があります。その際は、公式イメージをベースにカスタムイメージを構築し、ECR にプッシュした後、image_uri を指定してデプロイできます。

まず、image_uris.retrieveを使って公式イメージのURIを取得します。

# SageMaker フレームワークの情報
framework = "huggingface"  # 例: フレームワーク名
region = sagemaker.Session().boto_region_name  # 現在の AWS リージョン
framework_version = "4.26"  # Transformers のバージョン
pytorch_version = "1.13"  # PyTorch のバージョン
python_version = "py39"  # Python のバージョン

image_uri = sagemaker.image_uris.retrieve(
    framework=framework,
    region=region,
    version=framework_version,
    py_version=python_version,
    instance_type='ml.m5.xlarge',
    image_scope="inference",
    base_framework_version=f'pytorch{pytorch_version}'
)

print(image_uri)

出力例：
763104351884.dkr.ecr.ap-northeast-1.amazonaws.com/huggingface-pytorch-inference:1.13-transformers4.26-cpu-py39-ubuntu20.04

このイメージをベースに追加ライブラリをインストールするように Dockerfile を作成します。

# ベースイメージとして公式のhuggingface-pytorchイメージを指定
FROM 763104351884.dkr.ecr.ap-northeast-1.amazonaws.com/huggingface-pytorch-inference:1.13-transformers4.26.0-cpu-py39-ubuntu20.04

# 必要なシステムパッケージをインストール
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# fugashi と unidic-lite と sentencepieceをインストール (distilbert-base-japaneseを使う場合の例)
RUN pip install --no-cache-dir \
    fugashi==1.4.0 \
    unidic-lite==1.0.8 \
    sentencepiece==0.2.0

ECRのリポジトリを作成し、「プッシュコマンドを表示」の手順に従ってイメージをpushします。

# リポジトリの作成 ※CLIでやる場合
aws ecr create-repository --repository-name <your_ecr_repo> --region <your_region>

# Dockerログイン、イメージのビルドとプッシュ ※以降は「プッシュコマンドを表示」に記載されている内容
aws ecr get-login-password --region <your_region> | docker login --username AWS --password-stdin <your_account_id>.dkr.ecr.<your_region>.amazonaws.com

docker build -t <your_ecr_repo>:<tag> .
docker tag <your_ecr_repo>:<tag> <your_account_id>.dkr.ecr.<your_region>.amazonaws.com/<your_ecr_repo>:<tag>
docker push <your_account_id>.dkr.ecr.<your_region>.amazonaws.com/<your_ecr_repo>:<tag>

ECRへpushした後は、image_uri を HuggingFaceModel に指定してモデルを作成できます。
image_uriを指定する場合は、transformers_version、pytorch_version、py_version の指定は不要です。

# Hugging Face Model Classの作成
huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   image_uri='<your_account_id>.dkr.ecr.<your_region>.amazonaws.com/<your_ecr_repo>:<tag>'
)

モデルのデプロイ

続いて、作成したモデルをエンドポイントにデプロイして、外部から推論を行えるようにします。(ml.m5.xlarge であれば、最初の125時間は無料で利用できます。)

endpoint_name = 'distilbert-base-uncased-distilled-squad'

# エンドポイントのデプロイ
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',
    endpoint_name=endpoint_name
)

print(f"エンドポイント '{endpoint_name}' のデプロイが完了しました。")

正常に完了し、
SageMaker Studio の Deployment > Endpoints に Status 「✅ In service」と表示されていればOKです。

推論

先ほど作成したエンドポイントに対して、boto3 を使ってリクエストを送信し、推論結果を取得します。question-answering タスクでは、context（文脈情報）と question（質問）を与えると、answer（回答）が返ってくる仕組みになっています。
今回は例として、富士山に関するコンテキストを与えて、質問をしてみます。

import boto3
import json

# 入力データを準備
context = "Mount Fuji is the highest mountain in Japan, with a height of 3,776 meters. It is located on Honshu Island, near the border of Shizuoka and Yamanashi prefectures. Mount Fuji is an active stratovolcano that last erupted in 1707."
question = "How tall is Mount Fuji?"

payload = {
    "inputs": {
        "question": question,
        "context": context
    }
}

続いて、エンドポイントに対してリクエストを送る部分です。

# SageMaker Runtime クライアントの作成
sagemaker_runtime = boto3.client('sagemaker-runtime')

# エンドポイント名を指定
endpoint_name = 'distilbert-base-uncased-distilled-squad'

# 推論リクエストの送信
response = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(payload)
)

# 結果の取得と表示
result = json.loads(response['Body'].read().decode('utf-8'))
print(result)

出力例：
{'score': 0.9305776357650757, 'start': 62, 'end': 74, 'answer': '3,776 meters'}

それぞれ以下を表しています。

score: モデルがこの回答を選んだ信頼度（0〜1の値）
start: 回答が文脈内で始まる位置
end: 回答が文脈内で終わる位置
answer: モデルが抽出した回答

富士山の高さを質問し、3,776 meters と返ってきました。
tallという単語はコンテキストに含めていないため、モデルが意味や文脈を理解して回答していることがわかりますね

続いて、今度は富士山がどこに位置しているのか聞いてみます。

question2 = "Where is Mount Everest located between?"

payload2 = {
    "inputs": {
        "question": question2,
        "context": context
    }
}

# 推論リクエストの送信
response2 = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(payload2)
)

# 結果の取得と表示
result2 = json.loads(response2['Body'].read().decode('utf-8'))
print(result2)

出力例：
{'score': 0.809676468372345, 'start': 130, 'end': 164, 'answer': 'Shizuoka and Yamanashi prefectures'}

こちらも正しい応答が返ってきました！

後処理

不要になったリソースは削除しましょう。モデルは残しても課金は発生しませんが、エンドポイントは動かしている間、時間単位で課金されていくので注意が必要です。

predictor.delete_model()
predictor.delete_endpoint()

print("モデルとエンドポイントを削除しました。")

おわりに

今回は簡単な例として、Hugging Face の軽量モデルをSageMakerにデプロイし、質問応答を試してみました。
「question-answering」タスクは、与えられた情報から質問に対して適切な答えを導き出す技術です。このタスクを支える文脈理解と情報抽出の仕組みは、LLMが持つ自然言語処理能力の基礎になっています。

Hugging Faceに公開されているモデルであれば上記の方法で簡単にデプロイし試すことが可能なので、その他のモデル(日本語対応のモデル等)やその他のタスク(感情分析等)も試してみたいと思います。
※今回実施した「question-answering」タスクを実行するには、質問応答に特化してトレーニングされたモデルである必要があります。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up