Microsoft Captionbot

MicrosoftがMicrosoft Cognitive Servicesを発表しました。
Cognitive Servicesは画像分析、音声認識、レコメンデーション等のAPIツールを提供します。
そしてこのデモとしてCaptionBotを公開しています。
CaptionBotは、画像をアップロードするとそのキャプション（説明文）を生成してくれるサービスです。

キャプション生成の比較

CaptionBotが生成したキャプションと、Chainerで画像のキャプション生成で作ったキャプション生成プログラムによるキャプションとを比較してみました。
認識精度の高そうなキャプショニングツールとの比較を行ってみたかったところなので、丁度よいタイミングでした。
画像は自分で撮影した写真とPublicDomainPictures.netからダウンロードした画像を使いました。

浅草の風景

CaptionBot: I think it's a large bus on a city street.
自作: a city street filled with lots of traffic

どちらも建物の描写はありません。混んでいるようには見えないのですが、"lots of traffic"となっているのは謎ですね。

クリスマスツリー

CaptionBot: I am not really confident, but I think it's a large building in the snow.
自作: a fire hydrant sitting on the side of a road

学習データの単語を見ると"tree"は出現頻度が高いのですが認識できていません。雪がかぶっているからでしょうか。
"fire hydrant"はどの辺を認識したのでしょうか。赤い物でもあればまだわかるのですが。

時計

CaptionBot: I think it's a large clock.
自作: a clock on the side of a building

自作キャプショニングの方が少しだけ描写が細かいです。

自転車２人乗り

CaptionBot: I think it's person riding on the back of a bicycle.
自作: a person riding a bike on a city street

どちらも2人いることを認識できていません。重なる部分が少しあるので1人と認識しているのでしょうか。

交通整理中？の警官

CaptionBot: I think it's a man riding a skateboard down a street and he seems .
自作: a man riding a skateboard down a street

どちらもスケートボードを誤認識しているが面白いです。
CaptionBotは絵文字を出力することもあります。
今回はQiitaの絵文字から近いものを選びました。

テニスラケットを持った女性

CaptionBot: I think it's a woman holding a tennis racket and she seems .
自作: a woman holding a tennis racquet on a tennis court

どちらも正しく認識できています。

リビングのソファ

CaptionBot: I think it's a living room filled with furniture and a large window.
自作: a cat laying on top of a bed

CaptionBotは正しく認識していますが、自作キャプショニングはおそらくクッションを猫と誤認識しています。

白熊

CaptionBot: I think it's a polar bear in a zoo.
自作: two polar bears standing next to each other

自作キャプショニングはなぜか2頭いると認識しています。
岩が白いので誤認識したのでしょうか。

まとめ

少ない枚数ですが、Captionbotと自作キャプショニングを比較してみました。
Captionbotの方が精度が高く描写が細かい印象があります。
また顔文字が出たり、自信のない結果を"I am not really confident,"などと出力するのが面白いと思いました。
Captionbotでキャプション生成できるのは写真だけのようで、アニメやコミックの画像を入力すると"I can't really describe the picture but I do see drawing."などといった出力になります。

MicrosoftのCaptionbotと自作のキャプショニングを比較してみた