ONNXモデルを内蔵GPUで推論する。

Posted at 2023-01-16

ONNXモデルをグラボが無くても（CPUより）もっと速く推論できないか、ということで内蔵GPUで推論してみました。

環境構築

PCの要件

onnxruntime-directmlというパッケージを使うので、PCは以下の要件を満たす必要があります。

DirectX 12対応CPU（Intelなら第4世代以降）
Windows 10, version 1903以降のWindows

詳細は以下
https://onnxruntime.ai/docs/execution-providers/DirectML-ExecutionProvider.html#requirements

Pythonパッケージ

Pillow（画像読み込み用）
numpy（前処理用）
onnxruntime-directml

インストール後、pip freezeすると以下のようになりました（Python 3.10.9を使用）。

requirements.txt

coloredlogs==15.0.1
flatbuffers==22.12.6
humanfriendly==10.0
mpmath==1.2.1
numpy==1.24.1
onnxruntime-directml==1.13.1
packaging==22.0
Pillow==9.3.0
protobuf==4.21.12
pyreadline3==3.4.1
sympy==1.11.1

requirements.txtからインストールする場合は、以下のようなコマンドを実行します。

pip install --no-deps -r requirements.txt

学習済みモデル

以下からResNet50を拝借しました。
https://github.com/onnx/models/tree/main/vision/classification/resnet#model

ラベル

以下から拝借しました。
https://github.com/onnx/models/blob/main/vision/classification/synset.txt

テスト画像

以下から拝借しました。
https://gahag.net/010175-french-bulldog/
※ 上で表示されている画像は前処理で加工された状態のものです。

推論コード

from PIL import Image

import numpy as np
import onnxruntime as ort


def preprocess(pil_img):
    # 画像サイズを224x224に縮小
    # アスペクト比が変わらないよう画像の中心から最大サイズの正方形で抽出してから縮小
    crop_length = min(pil_img.height, pil_img.width)
    crop_left = (pil_img.width - crop_length) // 2
    crop_upper = (pil_img.height - crop_length) // 2
    crop_right = crop_left + crop_length
    crop_lower = crop_upper + crop_length
    pil_img = pil_img.crop((crop_left, crop_upper, crop_right, crop_lower))
    pil_img = pil_img.resize((224, 224))

    # PIL -> numpy
    img = np.asarray(pil_img)

    # 正規化
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
    img = (img / 255 - mean) / std

    # HWC -> NCHW
    img = np.transpose(img, axes=[2, 0, 1])
    img = np.expand_dims(img, axis=0)

    # double -> float
    img = np.float32(img)

    return img


pil_img = Image.open("gahag-0101751929.jpg")
img = preprocess(pil_img)

providers = ["DmlExecutionProvider"]
ort_sess = ort.InferenceSession("resnet50-v1-7.onnx", providers=providers)
outputs = ort_sess.run(None, {"data": img})

with open("synset.txt", "r") as f:
    labels = [line.rstrip() for line in f]

print(labels[outputs[0][0].argmax()])

実行結果は以下の通りで、正しく推論されました。

n02108915 French bulldog

ちなみに、providersでCPUExecutionProviderを指定するとCPUで推論することもできます。

(前略)
# providers = ["DmlExecutionProvider"]
providers = ["CPUExecutionProvider"]
ort_sess = ort.InferenceSession("resnet50-v1-7.onnx", providers=providers)
outputs = ort_sess.run(None, {"data": img})
(後略)

推論時間

推論時間計測用のコードは以下の通りです。

from PIL import Image

import numpy as np
import onnxruntime as ort

import timeit


def preprocess(pil_img):
    # 画像サイズを224x224に縮小
    # アスペクト比が変わらないよう画像の中心から最大サイズの正方形で抽出してから縮小
    crop_length = min(pil_img.height, pil_img.width)
    crop_left = (pil_img.width - crop_length) // 2
    crop_upper = (pil_img.height - crop_length) // 2
    crop_right = crop_left + crop_length
    crop_lower = crop_upper + crop_length
    pil_img = pil_img.crop((crop_left, crop_upper, crop_right, crop_lower))
    pil_img = pil_img.resize((224, 224))

    # PIL -> numpy
    img = np.asarray(pil_img)

    # 正規化
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
    img = (img / 255 - mean) / std

    # HWC -> NCHW
    img = np.transpose(img, axes=[2, 0, 1])
    img = np.expand_dims(img, axis=0)

    # double -> float
    img = np.float32(img)

    return img


pil_img = Image.open("gahag-0101751929.jpg")
img = preprocess(pil_img)

providers = ["DmlExecutionProvider"]
# providers = ["CPUExecutionProvider"]
ort_sess = ort.InferenceSession("resnet50-v1-7.onnx", providers=providers)

inference_time = timeit.timeit(
    lambda: ort_sess.run(None, {"data": img}), number=1
)
print(f"1回目の推論時間：{inference_time * 1000:.2f} ms")

inference_times = timeit.repeat(
    lambda: ort_sess.run(None, {"data": img}), repeat=100, number=1
)
mean = np.mean(inference_times)
std = np.std(inference_times)
print(f"2回目以降の推論時間：平均{mean * 1000:.2f} ms（標準偏差：{std * 1000:.2f} ms）")

以下のCPU（内蔵GPU）で計測しました。

AMD Ryzen 5 PRO 3500U（AMD Radeon(TM) Vega 8 Graphics）
- ノートPCなのでバッテリー駆動時と電源接続時の場合も計測
Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz（Intel(R) UHD Graphics 630）

結果は以下のようになりました。

計測環境			1回目の推論時間	2回目以降の推論時間
AMD	バッテリー駆動時	CPU	99.67 ms	平均108.84 ms（標準偏差：21.70 ms）
	バッテリー駆動時	内蔵GPU	3032.95 ms	平均105.70 ms（標準偏差：0.54 ms）
	電源接続時	CPU	84.48 ms	平均62.72 ms（標準偏差：14.13 ms）
	電源接続時	内蔵GPU	1667.75 ms	平均26.08 ms（標準偏差：2.14 ms）
Intel		CPU	17.39 ms	平均16.86 ms（標準偏差：5.78 ms）
Intel		内蔵GPU	2361.29 ms	平均89.61 ms（標準偏差：1.89 ms）

※ Intelは内蔵GPUを使うと反って遅くなってしまいましたが、世代が古いからなのかもしれません。最新世代をお持ちの方はぜひ試してみてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up