TensorFlowチュートリアル - 画像認識（翻訳） #Python

TensorFlow のチュートリアル（Image Recognition）
https://www.tensorflow.org/versions/master/tutorials/image_recognition
の翻訳です。
翻訳の誤りなどあればご指摘お待ちしております。

我々の脳は視覚を簡単に作るように思えます。人は、ライオンとジャガーを見分け、サインを読み取り、人間の顔を認識することに努力を必要としません。しかし、実際には、これらはコンピュータで解決することが難しい問題です。それが簡単に思えるのは、ただ、我々の脳が、画像を理解することに非常に優れているからです。

ここ数年、機械学習の分野では、これらの困難な問題に対処することに驚異的な進歩を遂げています。特に、深い畳み込みニューラルネットワークと呼ばれる種類のモデルが、難しい視覚認識タスクにおいて、そこそこのパフォーマンス、ある領域においては人と同等かそれ以上、を達成できることが判明しました。

研究者は、彼らの成果を ImageNet（コンピュータ・ビジョンのための学術ベンチマーク）で検証することにより、コンピュータ・ビジョンにおいて着実な進展を示しました。QuocNet、AlexNet、 Inception (GoogLeNet)、BN-Inception-v2 など、次々に現れるモデルは、改善を示し続け、各段階で最先端の結果を達成しています。Google 内外の研究者は、これらのモデルを記述した論文を発表してきましたが、これらの結果を再現することはまだ難しいです。弊社は、弊社の最新モデルである Inception-v3 で画像認識を実行するコードを公開することにより、次のステップに移りつつあります。

Inception-v3 は、ImageNet 大規模視覚認識チャレンジのために、2012年から、データを使用して訓練されています。これは、コンピュータ・ビジョンにおける標準的なタスクで、モデルが画像全体を、「シマウマ」、「ダルメシアン」、「食器洗い機」のように、1000クラスに分類することを試みます。例えば、AlexNet が一部の画像を分類した結果は以下のとおりです：

モデルを比較するために、モデルが予測したトップ5の中に正しい答えがなかった頻度（「トップ5エラー率」と呼ぶ）を調べます。AlexNet は2012検証データセットに対し15.3％のトップ5エラー率を達成し、BN-Inception-v2 は6.66%を達成、Inception-v3 は3.46％に達しました。

人は ImageNet チャレンジをどの程度うまくできるのでしょうか？自身のパフォーマンスの測定を試みた Andrej Karpathy によるブログ記事があります。彼は5.1％のトップ5エラー率でした。

このチュートリアルでは、Inception-v3 を使用する方法をお教えします。Python や C++ で画像を1000クラスに分類する方法を学びます。また、このモデルから、他のビジョン・タスクに再利用可能な、より高いレベルの特徴を抽出する方法を説明します。

私たちは、コミュニティがこのモデルですることを想像し、興奮しています。

Python API での使用法

classify_image.py プログラムを初めて実行すると、tensorflow.org から訓練済みモデルがダウンロードされます。ハードディスクに約200Mの使用可能な空き領域が必要です。

以下の手順では、PIP パッケージから TensorFlow をインストールし、ターミナルが TensorFlow のルートディレクトリにあることを前提とします。

cd tensorflow/models/image/imagenet
python classify_image.py

上記のコマンドは、与えられたパンダの画像を分類します。

モデルが正しく実行された場合、スクリプトは次のような出力を生成します。

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.88493)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00878)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00317)
custard apple (score = 0.00149)
earthstar (score = 0.00127)

--image_file 引数を編集することで、他のJPEG画像を与えることができます。

別のディレクトリにモデル・データをダウンロードする場合、--model_dir に使用するディレクトリを指定する必要があります。

C++ API での使用法

同じ Inception-v3 モデルを、本番環境のために C++ で実行することができます。（TensorFlow リポジトリのルートディレクトリから実行することにより）このようなモデルを定義する GraphDef を含むアーカイブをダウンロードできます：

wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip -O tensorflow/examples/label_image/data/inception_dec_2015.zip

unzip tensorflow/examples/label_image/data/inception_dec_2015.zip -d tensorflow/examples/label_image/data/

次に、グラフをロードして実行するコードを含む、C++ のバイナリをコンパイルする必要があります。TensorFlow をご使用中のプラットフォームのソースからインストールしている場合は、シェルの端末から以下のコマンドを実行して、サンプルをビルドできるはずです：

bazel build tensorflow/examples/label_image/...

バイナリ実行ファイルが作成され、以下のように実行できるはずです。

bazel-bin/tensorflow/examples/label_image/label_image

フレームワークに含まれるデフォルト・サンプル画像が使用され、以下のように出力されるはずです：

I tensorflow/examples/label_image/main.cc:200] military uniform (866): 0.647296
I tensorflow/examples/label_image/main.cc:200] suit (794): 0.0477196
I tensorflow/examples/label_image/main.cc:200] academic gown (896): 0.0232411
I tensorflow/examples/label_image/main.cc:200] bow tie (817): 0.0157356
I tensorflow/examples/label_image/main.cc:200] bolo tie (940): 0.0145024

ここでは、デフォルトのグレース・ホッパー提督の画像を使用し、彼女が軍服を着ていることをネットワークが0.6の高得点で正しく識別できていることが確認できます。

次に、--image= 引数を与えることにより、独自の画像で試してみます、例えば

bazel-bin/tensorflow/examples/label_image/label_image --image=my_image.png

tensorflow/examples/label_image/main.cc ファイルの中身を見れば、それがどのように動作するかを知ることができます。このコードが、TensorFlow を独自のアプリケーションに統合する助けになるように、main 関数を介して、詳細に見ていきます：

コマンドライン・フラグは、ファイルのロード元および入力画像のプロパティを制御します。モデルは、正方形の 299x299 RGB 画像を入力とするので、それらを input_width と input_height フラグに設定します。また、ピクセル値を 0〜255 の整数からグラフが動作する浮動小数点値にスケーリングする必要があります。input_mean と input_std フラグでスケーリングを制御します：最初に各ピクセル値から input_mean を引き、そして input_std で割ります。

これらの値は、多少マジック的に思われるかもしれませんが、元のモデル作成者が訓練の入力画像として使用したものに基づき、定義しています。あなた自身が訓練してきたグラフを使用する場合は、訓練プロセス中に使用したものに合わせて値を調整する必要があります。

ReadTensorFromImageFile() 関数内で、それらを画像に適用する方法を確認できます。

// Given an image file name, read in the data, try to decode it as an image,
// resize it to the requested size, and then scale the values as desired.
Status ReadTensorFromImageFile(string file_name, const int input_height,
                               const int input_width, const float input_mean,
                               const float input_std,
                               std::vector<Tensor>* out_tensors) {
  tensorflow::GraphDefBuilder b;

GraphDefBuilder を生成することから始めます。これは実行またはロードするモデルを指定するために使用するオブジェクトです。

  string input_name = "file_reader";
  string output_name = "normalized";
  tensorflow::Node* file_reader =
      tensorflow::ops::ReadFile(tensorflow::ops::Const(file_name, b.opts()),
                                b.opts().WithName(input_name));

そして、小さなモデルのノードを生成します。このノードは、メイン・モデルが入力として期待する結果を得るために、ロード、サイズ変更、ピクセル値のスケーリングをします。作成する最初のノードは、ロードしたい画像のファイル名を持つテンソルを保持しているだけの、Const 操作です。それはその後、 ReadFile 操作の第1入力として渡されます。すべての操作作成関数の最後の引数として、b.opts() を渡していることに気づいたでしょうか？この引数は、GraphDefBuilder に保持されるモデル定義に、ノードが追加されることを保証します。また、b.opts() の後に WithName() の呼び出しを行うことで、ReadFile 操作に名前を付けます。これは、ノードに名前を付けます。ノードに名前を付けなかった場合、自動的に名前が割り当てられますので、厳密には必須ではありませんが、名前を付けておくことでデバッグが少し楽になります。

  // Now try to figure out what kind of file it is and decode it.
  const int wanted_channels = 3;
  tensorflow::Node* image_reader;
  if (tensorflow::StringPiece(file_name).ends_with(".png")) {
    image_reader = tensorflow::ops::DecodePng(
        file_reader,
        b.opts().WithAttr("channels", wanted_channels).WithName("png_reader"));
  } else {
    // Assume if it's not a PNG then it must be a JPEG.
    image_reader = tensorflow::ops::DecodeJpeg(
        file_reader,
        b.opts().WithAttr("channels", wanted_channels).WithName("jpeg_reader"));
  }
  // Now cast the image data to float so we can do normal math on it.
  tensorflow::Node* float_caster = tensorflow::ops::Cast(
      image_reader, tensorflow::DT_FLOAT, b.opts().WithName("float_caster"));
  // The convention for image ops in TensorFlow is that all images are expected
  // to be in batches, so that they're four-dimensional arrays with indices of
  // [batch, height, width, channel]. Because we only have a single image, we
  // have to add a batch dimension of 1 to the start with ExpandDims().
  tensorflow::Node* dims_expander = tensorflow::ops::ExpandDims(
      float_caster, tensorflow::ops::Const(0, b.opts()), b.opts());
  // Bilinearly resize the image to fit the required dimensions.
  tensorflow::Node* resized = tensorflow::ops::ResizeBilinear(
      dims_expander, tensorflow::ops::Const({input_height, input_width},
                                            b.opts().WithName("size")),
      b.opts());
  // Subtract the mean and divide by the scale.
  tensorflow::ops::Div(
      tensorflow::ops::Sub(
          resized, tensorflow::ops::Const({input_mean}, b.opts()), b.opts()),
      tensorflow::ops::Const({input_std}, b.opts()),
      b.opts().WithName(output_name));

ノードの追加を続けます。これらのノードは、ファイル・データを画像に復号化し、整数を浮動小数点数にキャストし、それをサイズ変更し、最後にピクセル値の減算と除算操作を実行します。

  // This runs the GraphDef network definition that we've just constructed, and
  // returns the results in the output tensor.
  tensorflow::GraphDef graph;
  TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));

前段の終わりで、b 変数に格納されたモデル定義が得られます。ToGraphDef() 関数によりこれを完全なグラフ定義に変換します。

  std::unique_ptr<tensorflow::Session> session(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  TF_RETURN_IF_ERROR(session->Create(graph));
  TF_RETURN_IF_ERROR(session->Run({}, {output_name}, {}, out_tensors));
  return Status::OK();

その後、Session オブジェクト（グラフを実際に実行するためのインターフェース）を生成し、どのノードから出力を得たいか、どこに出力データを置くべきかを指定して、実行します。

これは Tensor オブジェクトのベクトルを返します。これはこの場合単に我々が昔から知っている単一のオブジェクトです。この文脈では、Tensor を多次元配列として考えることができます。それは高さ 299 ピクセル、幅 299 ピクセル、3チャンネルの画像を、浮動小数点値として保持します。あなたの製品が独自の画像処理フレームワークを使用している場合、同じ変換を適用する限り、メイン・グラフに画像をフィードする前に、代わりにそれを使用できるはずです。

これは、C++ で動的に小さな TensorFlow グラフを作成する簡単な例ですが、事前訓練済みの Inception モデルを使うために、ファイルからはるかに大きな定義をロードします。LoadGraph() 関数でその方法を確認することができます。

// Reads a model graph definition from disk, and creates a session object you
// can use to run it.
Status LoadGraph(string graph_file_name,
                 std::unique_ptr<tensorflow::Session>* session) {
  tensorflow::GraphDef graph_def;
  Status load_graph_status =
      ReadBinaryProto(tensorflow::Env::Default(), graph_file_name, &graph_def);
  if (!load_graph_status.ok()) {
    return tensorflow::errors::NotFound("Failed to load compute graph at '",
                                        graph_file_name, "'");
  }

画像ロードのコードを見ると、用語の多くは、おなじみに思われるはずです。GraphDef オブジェクトを生成するために GraphDefBuilder を使用するのではなく、GraphDef を直接含む protobuf ファイルをロードします。

  session->reset(tensorflow::NewSession(tensorflow::SessionOptions()));
  Status session_create_status = (*session)->Create(graph_def);
  if (!session_create_status.ok()) {
    return session_create_status;
  }
  return Status::OK();
}

その後、その GraphDef から Session オブジェクトを生成し、後で実行できるように、呼び出し元にそれを渡します。

GetTopLabels() 関数は大部分画像ロードに似ていますが、メイン・グラフの実行結果を取り、それを最高スコアのラベルのソート済みリストに変換します。画像ローダーのように、GraphDefBuilder を生成し、それに二つのノードを追加し、短いグラフを実行して出力テンソルのペアを取得します。出力テンソルのペアは、ソート済みスコアと、最高結果のインデックス位置を表します。

// Analyzes the output of the Inception graph to retrieve the highest scores and
// their positions in the tensor, which correspond to categories.
Status GetTopLabels(const std::vector<Tensor>& outputs, int how_many_labels,
                    Tensor* indices, Tensor* scores) {
  tensorflow::GraphDefBuilder b;
  string output_name = "top_k";
  tensorflow::ops::TopK(tensorflow::ops::Const(outputs[0], b.opts()),
                        how_many_labels, b.opts().WithName(output_name));
  // This runs the GraphDef network definition that we've just constructed, and
  // returns the results in the output tensors.
  tensorflow::GraphDef graph;
  TF_RETURN_IF_ERROR(b.ToGraphDef(&graph));
  std::unique_ptr<tensorflow::Session> session(
      tensorflow::NewSession(tensorflow::SessionOptions()));
  TF_RETURN_IF_ERROR(session->Create(graph));
  // The TopK node returns two outputs, the scores and their original indices,
  // so we have to append :0 and :1 to specify them both.
  std::vector<Tensor> out_tensors;
  TF_RETURN_IF_ERROR(session->Run({}, {output_name + ":0", output_name + ":1"},
                                  {}, &out_tensors));
  *scores = out_tensors[0];
  *indices = out_tensors[1];
  return Status::OK();

PrintTopLabels() 関数は、これらのソート結果をとり、見やすい方法で出力します。CheckTopLabel() 関数は非常によく似ていますが、デバッグのために、単にトップラベルが期待したものであることを確認します。

最後に、main() は、これらの呼び出しすべてを結びつけます。

int main(int argc, char* argv[]) {
  // We need to call this to set up global state for TensorFlow.
  tensorflow::port::InitMain(argv[0], &argc, &argv);
  Status s = tensorflow::ParseCommandLineFlags(&argc, argv);
  if (!s.ok()) {
    LOG(ERROR) << "Error parsing command line flags: " << s.ToString();
    return -1;
  }

  // First we load and initialize the model.
  std::unique_ptr<tensorflow::Session> session;
  string graph_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_graph);
  Status load_graph_status = LoadGraph(graph_path, &session);
  if (!load_graph_status.ok()) {
    LOG(ERROR) << load_graph_status;
    return -1;
  }

メイン・グラフをロードします。

  // Get the image from disk as a float array of numbers, resized and normalized
  // to the specifications the main graph expects.
  std::vector<Tensor> resized_tensors;
  string image_path = tensorflow::io::JoinPath(FLAGS_root_dir, FLAGS_image);
  Status read_tensor_status = ReadTensorFromImageFile(
      image_path, FLAGS_input_height, FLAGS_input_width, FLAGS_input_mean,
      FLAGS_input_std, &resized_tensors);
  if (!read_tensor_status.ok()) {
    LOG(ERROR) << read_tensor_status;
    return -1;
  }
  const Tensor& resized_tensor = resized_tensors[0];

ロード、サイズ変更、および入力画像の加工をします。

  // Actually run the image through the model.
  std::vector<Tensor> outputs;
  Status run_status = session->Run({{FLAGS_input_layer, resized_tensor}},
                                   {FLAGS_output_layer}, {}, &outputs);
  if (!run_status.ok()) {
    LOG(ERROR) << "Running model failed: " << run_status;
    return -1;
  }

ここでは、入力として画像を用いて、ロードされたグラフを実行します。

  // This is for automated testing to make sure we get the expected result with
  // the default settings. We know that label 866 (military uniform) should be
  // the top label for the Admiral Hopper image.
  if (FLAGS_self_test) {
    bool expected_matches;
    Status check_status = CheckTopLabel(outputs, 866, &expected_matches);
    if (!check_status.ok()) {
      LOG(ERROR) << "Running check failed: " << check_status;
      return -1;
    }
    if (!expected_matches) {
      LOG(ERROR) << "Self-test failed!";
      return -1;
    }
  }

テスト目的で、ここで期待する出力が得られていることを確認するためにチェックできます。

  // Do something interesting with the results we've generated.
  Status print_status = PrintTopLabels(outputs, FLAGS_labels);

最後に、見つかったラベルを出力します。

  if (!print_status.ok()) {
    LOG(ERROR) << "Running print failed: " << print_status;
    return -1;
  }

ここでのエラー処理では、TensorFlow の Status オブジェクトを使用しています。このオブジェクトは ok() チェッカーにより、エラーが発生したかどうかを知り、エラーメッセージを出力することができるので、非常に便利です。

このケースではオブジェクト認識をデモしていますが、様々な領域で、あなた自身が見つけ、訓練した他のモデルでも、非常によく似たコードを使用することができるはずです。この小さな例により、あなた自身の製品に TensorFlow を使用する方法について、いくつかのアイデアが得られることを願っています。

演習：転移学習とは、あるタスクをうまく解決する方法を知っていれば、関連する問題の解決に、その理解の一部を転移できるはずであるという、考えです。転移学習を実行する一つの方法は、ネットワークの最後の分類の層を除去し、CNN の最後から二番目の層、この場合は2048次元のベクトル、を抽出することです。C++ API の例において、--output_layer= pool_3 を設定し、出力テンソルの取り扱いを変更することで、これを指定することができます。画像の集合でこの特徴を抽出してみて、ImageNet に無い新しいカテゴリを予測できることを確認してください。

詳細を学ぶためのリソース

ニューラルネットワークを一般に学ぶためには、Michael Nielsen の無料オンライン本は優れたリソースです。特に、畳み込みニューラルネットワークでは、Chris Olah のいくつかの素晴らしいブログ記事があり、Michael Nielsen の本はそれらをカバーする偉大な章があります。

畳み込みニューラルネットワークの実装の詳細を知るためには、TensorFlow の深い畳み込みネットワークのチュートリアルに跳ぶか、または少し緩やかに ML 初心者や ML 熟練者の MNIST スターター・チュートリアルで開始することができます。最後に、この分野の研究の最新情報を得たい場合は、このチュートリアルが参照している論文の、最近の研究結果を読んでください。