Elixir Nx.Tensorのminとmaxを求める

Last updated at 2023-04-10Posted at 2023-04-08

Tensorflow LiteとMoveNetモデルを用いた姿勢推定の勉強をしていたときに学んだ内容をメモ。姿勢推定のコードについてはtflite_elixirに寄贈しました。

https://github.com/cocoa-xu/tflite_elixir/blob/main/notebooks/pose_estimation_image_sequence.livemd

ここではNx.Tensorの値の最小値と最大値を求める技について考えます。

以下のもくもく会での成果です。

依存パッケージ

基本的にNxの関数に取り組むのですが、OpenCVでも求めることができるので後で試せるように予めEvisionも使えるようにしておきます。

Mix.install([
  {:nx, "~> 0.5.0"},
  {:evision, "0.1.30"}
])

# Evision作者が好んで用いるエイリアス
alias Evision, as: Cv

データ

点群データを用意します。17個の二次元座標値（X,Y）のNx.Tensorを作ります。

keypoints =
  Nx.tensor(
    [
      [341.7188653945923, 43.22674134373665],
      [347.2751133441925, 38.838579922914505],
      [337.92927277088165, 38.542279705405235],
      [353.93250143527985, 44.37031477689743],
      [332.8073402643204, 43.509576231241226],
      [358.5170496702194, 66.75018179416656],
      [326.14933770895004, 69.04044127464294],
      [346.01376926898956, 90.88503938913345],
      [303.05064821243286, 94.19108891487122],
      [313.96379578113556, 87.70558965206146],
      [302.41149455308914, 88.45037072896957],
      [348.3349735736847, 134.9865512251854],
      [326.03289169073105, 134.1936908364296],
      [351.33337795734406, 197.84249007701874],
      [319.55440336465836, 196.71163403987885],
      [355.83895242214203, 247.53425693511963],
      [340.4989422559738, 243.13054251670837]
    ],
    names: [:points, :values]
  )

Nx.reduce/4

最初Nx.reduce/4を使ってみようと検討したのですが、ドキュメント読んでみるとどうもこれは避けた方がよさそうです。

Given this function relies on anonymous functions, it may not be available or efficient on all Nx backends. Therefore, you should avoid using reduce/4 whenever possible. Instead, use functions sum/2, reduce_max/2, all/1, and so forth.

Nxには最小値と最大値を求めるための専用の関数があります。それらを使ってみます。

Nx.reduce_min/2

Nx.reduce_min/2 はオプションが何も指定されてしない場合は、すべての値の最小値が返ります。

keypoints |> Nx.reduce_min() |> Nx.to_number()
38.54227828979492

軸を指定すると、xとyの最小値を別々に求められます。

keypoints |> Nx.reduce_min(axes: [:points]) |> Nx.to_list()
[302.4114990234375, 38.54227828979492]

Nx.reduce_max/2

Nx.reduce_max/2 はオプションが何も指定されてしない場合は、すべての値の最大値が返ります。

keypoints |> Nx.reduce_max() |> Nx.to_number()
358.5170593261719

軸を指定すると、xとyの最大値を別々に求められます。

keypoints |> Nx.reduce_max(axes: [:points]) |> Nx.to_list()
[358.5170593261719, 247.5342559814453]

Evision.boundingRect/1

Evision.boundingRect/1でも最小値、最大値を求めることができます。画像処理等の目的で既にEvisionを使っている場合にはこれが便利かもしれません。

minmax_by_cv = fn %Nx.Tensor{} = points_nx ->
  {x, y, w, h} = Cv.boundingRect(points_nx)
  {x, y, x + w, y + h}
end

minmax_by_cv.(keypoints)
{302, 38, 359, 248}

しかしながら、Nxでやってもそんなに記述量は変わりません。

minmax_by_nx = fn %Nx.Tensor{} = points_nx ->
  [x_min, y_min] = points_nx |> Nx.reduce_min(axes: [:points]) |> Nx.to_list()
  [x_max, y_max] = points_nx |> Nx.reduce_max(axes: [:points]) |> Nx.to_list()
  {floor(x_min), floor(y_min), ceil(x_max), ceil(y_max)}
end

minmax_by_nx.(keypoints)
{302, 38, 359, 248}

@zacky1972 さんからアドバイスをいただきました。ありがとうーーーーッございます！

確かに性能の方が気になります。早速、ベンチマークと取ってみます。

ベンチマーク

まず、ダミーデータの準備をします。数値は、:rand.uniform/1を用いて浮動小数点数をランダムに生成しようと思います。

入力データの大きさを簡単に変えられるように要素N個のテンソルを生成する関数を用意します。

gen_random_points = fn n ->
  1..n
  |> Enum.map(fn _ -> [:rand.uniform(), :rand.uniform()] end)
  |> Nx.tensor(names: [:points, :values])
end

gen_random_points.(10)

複数パターンの入力データを作ります。

inputs = %{
  "small tensor" => gen_random_points.(100),
  "medium tensor" => gen_random_points.(10_00),
  "large tensor" => gen_random_points.(1_000_00)
}

Benchee.run/2を用いてベンチマークを実施します。

Benchee.run(
  %{
    "Nx" => minmax_by_nx,
    "Cv" => minmax_by_cv
  },
  inputs: inputs
)

結果発表！

Nx.BinaryBackend

手元にたまたまある環境でなんの最適化もされていないNxとOpenCVとで比較した結果、入力テンソルの大きさを問わずOpenCVの方が速いことがわかりました。その開きは入力テンソルが大きくなるにつれてより顕著になるようです。

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.14.4
Erlang 25.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: large tensor, medium tensor, small tensor
Estimated total run time: 42 s

Benchmarking Cv with input large tensor ...
Benchmarking Cv with input medium tensor ...
Benchmarking Cv with input small tensor ...
Benchmarking Nx with input large tensor ...
Benchmarking Nx with input medium tensor ...
Benchmarking Nx with input small tensor ...

##### With input large tensor #####
Name           ips        average  deviation         median         99th %
Cv         13.36 K      0.0749 ms    ±72.57%      0.0666 ms        0.24 ms
Nx        0.0148 K       67.39 ms     ±6.63%       65.15 ms       76.86 ms

Comparison: 
Cv         13.36 K
Nx        0.0148 K - 900.00x slower +67.31 ms

##### With input medium tensor #####
Name           ips        average  deviation         median         99th %
Cv         38.50 K       25.97 μs   ±211.91%       19.04 μs      190.59 μs
Nx          1.21 K      828.07 μs    ±17.13%      810.15 μs     1112.49 μs

Comparison: 
Cv         38.50 K
Nx          1.21 K - 31.88x slower +802.10 μs

##### With input small tensor #####
Name           ips        average  deviation         median         99th %
Cv         34.67 K       28.85 μs   ±201.08%       19.92 μs      219.39 μs
Nx          6.83 K      146.46 μs    ±56.00%      153.88 μs      282.96 μs

Comparison: 
Cv         34.67 K
Nx          6.83 K - 5.08x slower +117.61 μs

EXLA.Backend

NxのBackendをEXLA.Backendにしたらかなり高速化されました。まだOpenCVより遅いですがその差は縮まりました。テンソルの大きさを問わず2倍ほどの差です。

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 32 GB
Elixir 1.14.4
Erlang 25.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: large tensor, medium tensor, small tensor
Estimated total run time: 42 s

Benchmarking Cv with input large tensor ...
Benchmarking Cv with input medium tensor ...
Benchmarking Cv with input small tensor ...
Benchmarking Nx with input large tensor ...
Benchmarking Nx with input medium tensor ...
Benchmarking Nx with input small tensor ...

##### With input large tensor #####
Name           ips        average  deviation         median         99th %
Cv          7.07 K      141.44 μs    ±50.78%         129 μs      207.88 μs
Nx          3.00 K      333.08 μs     ±8.27%      322.54 μs      401.56 μs

Comparison: 
Cv          7.07 K
Nx          3.00 K - 2.36x slower +191.65 μs

##### With input medium tensor #####
Name           ips        average  deviation         median         99th %
Cv         19.67 K       50.85 μs    ±26.37%       44.08 μs       89.92 μs
Nx          8.75 K      114.34 μs    ±47.57%      104.83 μs      207.01 μs

Comparison: 
Cv         19.67 K
Nx          8.75 K - 2.25x slower +63.49 μs

##### With input small tensor #####
Name           ips        average  deviation         median         99th %
Cv         18.80 K       53.19 μs    ±65.07%       46.79 μs       94.30 μs
Nx          9.18 K      108.88 μs    ±35.53%       98.88 μs      175.50 μs

Comparison: 
Cv         18.80 K
Nx          9.18 K - 2.05x slower +55.69 μs

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up