More than 5 years have passed since last update.

「Show and Tell」の TensorFlow をCPUでお試し

Last updated at 2018-11-18Posted at 2018-05-22

画像から説明文を生成する手法として前々から興味のあった「Show and Tell」をCPU環境でお試ししました！
動かすまでに色々つまずいたのでうまくいったやり方をメモ！

＜作業環境＞
linux 16.04
numpy (1.14.3)
tensorflow (1.0.0)
Python (3.6.4)
Babel (2.5.3)

参考サイト様
画像にキャプションを付ける「Show and Tell」のTensorFlow実装を試してみた⇒成功

ここからは上記の作業環境が整っていると仮定して進めます。
流れやコマンドなどの大部分を参考サイト様から引用しています。
参考サイト様まじでわかりやくて猛烈感謝すぎる。

「Show and Tell」Tensorflow実装コードのダウンロード

ダウンロードしたいディレクトリ( 今回はtest )に移動し、Git Cloneします。

cd test
git clone https://github.com/tensorflow/models

※今回は有志の方が配布してくれているトレーニングデータを用います

Inception v3 の導入

cd ~/test/models/research/im2txt/im2txt/data
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
rm inception_v3_2016_08_28.tar.gz

トレーニング済みモデルの準備

GitHubのtensorflow/modelsのIssues「Pretrained model for img2txt? #466」で「here are links to a pre-trained model:」を検索すると、その書き込みの下に3つのリンクがあります。
そこから「im2txt_2016_10_11.2000000.tar.gz」(←finetuned)と「word_counts.txt」を~/test/models/research/im2txtにコピー

解凍もしておきます

tar xf im2txt_2016_10_11.2000000.tar.gz
rm im2txt_2016_10_11.2000000.tar.gz

ここでPython3などに対応させるため(←たぶん)「word_counts.txt」の中身を若干変更します。
変更前↓

cat ~/test/models/research/im2txt/word_counts.txt | head

結果

b'a' 969108
b'</S>' 586368
b'<S>' 586368
b'.' 440479
b'on' 213612
b'of' 202290
b'the' 196219
b'in' 182598
b'with' 152984
...

脳みそゴリラだから .replace()でいらない「b'」と「'」取りました・・・

変更後↓

cat ~/test/models/research/im2txt/word_counts.txt | head

結果

a 969108
</S> 586368
<S> 586368
. 440479
on 213612
of 202290
the 196219
in 182598
with 152984
...

次に、以下のコードを利用して checkpoint file の修正を行います。コードはここを参考にしました。
vimとかでファイル作って実行してください。

checkpoint_change.py

OLD_CHECKPOINT_FILE = "~/test/models/research/im2txt/model.ckpt-2000000"
NEW_CHECKPOINT_FILE = "~/test/models/research/im2txt/model.ckpt-2000000"

import tensorflow as tf
vars_to_rename = {
    "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
    "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
  if old_name in vars_to_rename:
    new_name = vars_to_rename[old_name]
  else:
    new_name = old_name
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, NEW_CHECKPOINT_FILE)

キャプション生成

いよいよ生成します・・・！！！
まず run_inference スクリプトをビルド

cd ~/test/models/research/im2txt
bazel build -c opt im2txt/run_inference

次に、キャプション生成させたい画像ファイル（今回はcat.jpg）を~/test/models/research/im2txt/im2txtコピーした後、run_inference バイナリを実行します。
今回はinput画像は1つでしたが、「,」で続けて指定すると連続inputできます。

~/test/models/research/im2txt/bazel-bin/im2txt/run_inference \ --checkpoint_path="~/test/models/research/im2txt/model.ckpt-2000000" \ --vocab_file="~/test/models/research/im2txt/word_counts.txt" \ --input_files="~/test/models/research/im2txt/im2txt/cat.jpg"

結果

INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: ~/test/models/research/im2txt/word_counts.txt
...

  0) a close up of a cat laying on a bench (p=0.000709)
  1) a close up of a cat laying on a park bench (p=0.000542)
  2) a close up of a cat on a bench (p=0.000462)

いえーい
いいかんじなのではないでしょうか～うふふ～

※一部PATH部分を修正しました(2018/11/18)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up