More than 1 year has passed since last update.

*Interp移植録 - 髪型セグメンテーション / Hair Segmentation (TflInterp)

Last updated at 2023-04-21Posted at 2023-04-20

0.Prologue

暇つぶしに、興味を引いた DNNアプリを *Interpに移植して遊んでいる。
本稿はその雑記&記録。

前回の"*Interp移植録"の後、"DeepFillv2"と言う image inpaintingのモデルを弄っていた。そう、Google PixelのCMで宣伝されている「消しゴムマジック」の様な画像処理をやってみようと思い立ったのだ。しかしながら、"DeepFillv2"はTensolflowでは動かせるが、拙作の TflInterpでは動かせないと分かった。カスタム演算子として"ExtractImagePatches"を付け加えないとダメなのだ。落胆

そんな訳で、少し遠回りして「カスタム演算子組み込み」のお手本である (旧)MediaPipeの Hair-Segmentationを TflInterpでちょこっと動かしてみることにした。

1.Original Work

Hair-Segmentationは、結局のところ画像処理の Semantic segmentationなので、モデルのアーキテクチャは例によって U-Netの様に encoder/decoderをつなぎ合わせた構造をしている。この手のアーキテクチャを "hourglass segmentation network architecture"と呼ぶそうだ…「砂時計」か、なるほど。

U-Netといえば処理が重く遅いのが有名なのだが、このモデルの体感速度はその汚名を返上できる速さだった。あれこれ工夫されているようだ(詳しくは文献を参照のこと)

Real-time Hair segmentation and recoloring on Mobile GPUs
https://sites.google.com/view/perception-cv4arvr/hair-segmentation
GitHub: MediaPipe Hair Segmentation
https://github.com/google/mediapipe/blob/master/docs/solutions/hair_segmentation.md
GitHub: MediaPipe Models and Model Cards(旧版)
https://github.com/google/mediapipe/blob/master/docs/solutions/models.md#hair_segmentation

2.準備

冒頭で触れた通り、MediaPipeの Hair-Segmentationタスクは Tensorflow liteにカスタム演算子を組み込むお手本と見做すことができる。事実、Hair-Segmentationを動かす為に、

MaxPoolingWithArgmax2D
MaxUnpooling2D
Convolution2DTransposeBias

の３つのカスタム演算子が Tensorflow liteに組み込まれている。おまけに、それぞれの演算子のソースコードが下記に公開されていて、詳しく学ぶことができるのだ。

カスタム演算子の基本的な組み込み手順は下記で解説されているのだが、とても簡単な例でかつ僅かな情報しか載っていない。いざ自前で tensorを扱う演算子を作成しようとするとハタっと手が止まってしまうことだろう。何を隠そう、一ヶ月ほど前の小生の姿である。そんな訳で、こういう実例のお手本があるととても助かるのである

さてと、この記事はカスタム演算子の設計云々を紹介することが目的ではないので、コピペした上記３つのカスタム演算子を TflInterpにサラッと組み込むコードを載せるに留めておくことにする。

※気が向いたら "ExtractImagePatches"の設計備忘録を書くかもしれない…今はまだ構想設計すらないが

custom_operations.cc

#include "tensorflow/lite/kernels/register.h"

#include "max_pool_argmax.h"
#include "max_unpooling.h"
#include "transpose_conv_bias.h"

// 個々のカスタム演算子を組み込む関数
void add_custom_operations(tflite::ops::builtin::BuiltinOpResolver& resolver)
{
    resolver.AddCustom("MaxPoolingWithArgmax2D", custom_operations::RegisterMaxPoolingWithArgmax2D());
    resolver.AddCustom("MaxUnpooling2D", custom_operations::RegisterMaxUnpooling2D());
    resolver.AddCustom("Convolution2DTransposeBias", custom_operations::RegisterConvolution2DTransposeBias());
}

tfl_interp.cc

// Tensorflow lite インタープリタの初期化
TflInterp::TflInterp(std::string tfl_model, int thread)
{
    // load tensor flow lite model
    mModel = tflite::FlatBufferModel::BuildFromFile(tfl_model.c_str());

    tflite::ops::builtin::BuiltinOpResolver resolver;

    // カスタム演算子の組み込み
    add_custom_operations(resolver);
    //

    tflite::InterpreterBuilder builder(*mModel, resolver);
    :
    :
}

Apr 15, 2023現在、hex.pmに公開している tfl_interp 0.1.10には上記改造を反映している。よって mix.exsの依存リストに {:tfl_interp "~> 0.1.10}"を書き加えてビルドすれば、MediaPipe Hair-Segmentationを遊ぶことができる……のだが、お薦めしない。

と言うのも、tfl_interpモジュールのビルドでは、Tensorflowのソースコード一式プラスαをごっそりとダウンロードし、その後各ライブラリを延々とコンパイルし始めるのだ。Tensorflow liteのコンパイル済みライブラリが提供されていないので、ソースコードから黙々とビルドするのは仕方がない。でもまぁ、小生のオンボロPCで軽く30分は掛かるのだ。tfl_interpモジュールを利用するアプリ毎にこんなに時間が掛かってはたまったものではない。

そこで、ひとつ善処策を用意した。tfl_interpモジュールを単独で一度だけビルドし、アプリからはその tfl_interpモジュールを path指定で参照するようにする。最初の一回は我慢が必要だが、それ以降はビルド済みのモジュールを使いまわそうと言う魂胆である。具体的には以下の手順に倣えば良い。

まずは、適当なディレクトリ下(仮に"ほにゃらら"とする)で tfl_interpモジュールを単独ビルドする。先にも触れた通り、大量のファイルのダウンロードとそのコンパイルを行うので、それ相応の時間が掛かる

git clone https://github.com/shoz-f/tfl_interp.git
cd tfl_interp
mix deps.get
mix compile

アプリの mix.exsの依存リストには、mix compile & CMakeが余計なことをしない様に下記を記述する。OSの環境変数"SKIP_MAKE_TFLINTERP"に"YES"を設定するところが味噌である。

mix.exs

 :
def deps do
  System.put_env("SKIP_MAKE_TFLINTERP", "YES")     # CMakeによる tfl_interp実効形式のビルドをスキップする
  [
    :
    {:tfl_interp, path: "ほにゃらら/tfl_interp"},  # git cloneした tfl_interpの pathを指定
    :
  ]
end

tfl_interpモジュールのバージョンを変えない限り、ビルド済みモジュールを使いまわすことができる

3.TflInterp用のLivebookノート

さて、話を戻して Hair-Setmentationの移植を行おう。コードはLivebookノートに記述する。

Mix.installの依存リストに記述するモジュールは下記の通り。

File.cd!(__DIR__)
# for windows JP
System.shell("chcp 65001")
System.put_env("SKIP_MAKE_TFLINTERP", "YES")

Mix.install([
  {:tfl_interp, path: ".."},
  {:cimg, "~> 0.1.19"},
  {:nx, "~> 0.4.0"},
  {:kino, "~> 0.7.0"}
])

tfl_interpモジュールは単独ビルド済みで、tfl_interpモジュールとLivebookノートのディレクトリ位置関係は下記を想定している。

── tfl_interp - tfl_interpモジュール
    ├─ lib
    ├─ priv
    │   └─ tfl_interp - 実行形式
    └─ demo_hairsegmentation
        └─ HairSegmentation.livemd - デモ用Livebookノート

Hair-Segmentationモデルの入力には、RGB３プレーン＋Mask１プレーンの計４プレーンの画像を渡す。論文によると、Maskプレーンにはひとつ前の推論結果で得たMask画像を渡すようだが、代わりに0フィルの画像を毎回渡しても構わないようだ。出力には、画像中の各画素が背景である確率と髪領域である確率の２枚のマップが返ってくる。

[モデル・カード]

inputs:
[0] f32:{1,512,512,4} - (RGB+Mask)画像,画素はR,G,Bの各値を{0.0～1.0}に正規化,Maskはすべて0でも構わない

outputs:
[0] f32:{1,512,512,2} - 画像中の画素が背景である確率[][][][0],髪領域である確率[][][][1]

defmodule HairSegmentation do
  @width 512
  @height 512

  alias TflInterp, as: NNInterp

  use NNInterp,
    model: "./model/hair_segmentation.tflite",
    url: "https://storage.googleapis.com/mediapipe-assets/hair_segmentation.tflite",
    inputs: [f32: {1, @width, @height, 4}],
    outputs: [f32: {1, @width, @height, 2}]

  def apply(img) do
    # preprocess
    input0 =
      CImg.builder(img)
      |> CImg.resize({@width, @height})
      |> CImg.append(CImg.create(@width, @height, 1, 1, 0), :c)
      |> CImg.to_binary([{:range, {0.0, 1.0}}])

    # prediction
    output =
      session()
      |> NNInterp.set_input_tensor(0, input0)
      |> NNInterp.invoke()
      |> NNInterp.get_output_tensor(0)
      |> Nx.from_binary(:f32)
      |> Nx.reshape({@height, @width, :auto})

    # postprocess
    [background, hair] =
      Enum.map(0..1, fn i ->
        Nx.slice_along_axis(output, i, 1, axis: 2) |> Nx.squeeze()
      end)

    {w, h, _, _} = CImg.shape(img)

    Nx.greater(hair, background)
    |> Nx.to_binary()
    |> CImg.from_binary(@width, @height, 1, 1, dtype: "<u1")
    |> CImg.resize({w, h})  # make image
  end

  def coloring(img, color, opacity \\ 0.5) do
    mask = HairSegmentation.apply(img)
    CImg.paint_mask(img, mask, color, opacity)
  end
end

心臓部の推論関数は apply/1だけなのだが、推論結果を描画するための補助関数 coloring/3を用意した。coloring/3に元の画像imgと髪を塗り替える色color、透明度opacityを渡せば、入力画像から髪の領域を推論し、その領域の色をアルファ・ブレンドで塗り替えた画像が返ってくる。

注意:
論文ではもっと手の込んだ色変換を提案している。髪の領域の輝度情報だけを再利用し、色相&彩度はごっそり書き換えるようなことをしている。これを真面目に実装しようとすると、RGB-HSV変換やら、マスキングありペイントやらごちゃごちゃと面倒くさそうだったので、思いっきり手抜きをした

4.デモンストレーション

HairSegmentationを起動する。

HairSegmentation.start_link([])

画像を与え、HairSegmentationを実行する。

img = CImg.load("photo_girl.jpg")
colored = HairSegmentation.coloring(img, [{0, 255, 0}], 0.3)

Enum.map([img, colored], &CImg.display_kino(&1, :jpeg))
|> Kino.Layout.grid(columns: 2)

5.Epilogue

Tensorflow liteの「カスタム演算子組み込み」の勉強がてら (旧)MediaPipe Hair-Segmentationを移植して遊んでみた。

カスタム演算子の作り方&使い方は大体理解できたと思う。あとは課題の"ExtractImagePatches"カスタム演算子の実装仕様を決めてごにょごにょすれば "DeepFillv2"で遊べるかも……ちょいとモチベーションが下がっているところが難点だが

一方、たまたま触ってみただけの Hair-Segmentationではあるが、なかなか面白いじゃないか。カラー・テーブル回りをもうちょい改良すれば、アニメ・ライクな髪色フィルタが作れて遊べそうに思う。例えば、感情推論モデルと組み合わせて髪色を変化させたりするのはどうだろうか?

なのにどうしてシン・MediaPipeからこのタスクは無くなってしまったのかな? 残念

Appendix

TflInterpのノート
https://github.com/shoz-f/tfl_interp/blob/main/demo_hairsegmentation/HairSegmentation.livemd

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up