More than 5 years have passed since last update.

AWSの画像認識API Rekognitionをちょっと試してみた(手書きテキスト編)

Last updated at 2018-10-09Posted at 2018-10-09

概要

テキスト検出ができるということで、
主に手書きのいろいろな画像を試して出力結果をみてみる。

基本実装

こんな感じ。S3の配置や準備については前回の記事( https://qiita.com/Otofuke/items/35f1ea884a3464bda247 )で書いたので省略。
参考：https://docs.aws.amazon.com/rekognition/latest/dg/text-detecting-text-procedure.html

import boto3

rekognition = boto3.client("rekognition", "us-east-1")
bucket = "fugaaaaaa"
key = {対象の画像ファイル名}

response = rekognition.detect_text(
        Image={
            "S3Object": {
                "Bucket": bucket,
                "Name": key,
            }
        }
    )

textDetections=response['TextDetections']
print(response)
for text in textDetections:
        print('Id: {}'.format(text['Id']))
        if 'ParentId' in text:
            print('ParentId: {}'.format(text['ParentId']))
        print('DetectedText:' + text['DetectedText'])
        print('Confidence: ' + "{:.2f}".format(text['Confidence']) + "%")
        print ('Type:' + text['Type'])

↑の{対象の画像ファイル名}をコロコロ変えて出力した結果を記載していく。

ケース1.　日本語 + 英語

対象画像

出力結果

Id: 0
DetectedText:H
Confidence: 44.87%
Type:LINE
Id: 1
DetectedText:Tu.
Confidence: 83.65%
Type:LINE
Id: 2
DetectedText:O
Confidence: 54.09%
Type:LINE
Id: 3
DetectedText:Otofate
Confidence: 98.83%
Type:LINE
Id: 4
ParentId: 0
DetectedText:H
Confidence: 44.87%
Type:WORD
Id: 5
ParentId: 1
DetectedText:Tu.
Confidence: 83.65%
Type:WORD
Id: 6
ParentId: 2
DetectedText:O
Confidence: 54.09%
Type:WORD
Id: 7
ParentId: 3
DetectedText:Otofate
Confidence: 98.83%
Type:WORD

type:LINEが4つってことは、4行あるってことはわかってくれた。
日本語は対応してないのでもちろん全滅。
英語も手書きは書き方のクセのおかげで正解ならず。confidence98.83%チャウワ。

ケース2.　英語

対象画像

あ。右上に想定外の文字列が入り込んでしまった。まあいいか。

出力結果

Id: 0
DetectedText:Otofute
Confidence: 89.34%
Type:LINE
Id: 1
DetectedText:Life
Confidence: 98.02%
Type:LINE
Id: 2
DetectedText:(s
Confidence: 77.45%
Type:LINE
Id: 3
DetectedText:beautifo
Confidence: 92.57%
Type:LINE
Id: 4
DetectedText:fofofe
Confidence: 87.27%
Type:LINE
Id: 5
ParentId: 0
DetectedText:Otofute
Confidence: 89.34%
Type:WORD
Id: 6
ParentId: 1
DetectedText:Life
Confidence: 98.02%
Type:WORD
Id: 7
ParentId: 2
DetectedText:(s
Confidence: 77.45%
Type:WORD
Id: 8
ParentId: 3
DetectedText:beautifo
Confidence: 92.57%
Type:WORD
Id: 9
ParentId: 4
DetectedText:fofofe
Confidence: 87.27%
Type:WORD

Life正解！！右上の文字列がおしい！あとは残念。
@は読んでくれないのね。

ケース3-1.　筆記体

対象画像

Life is beautiful otofuke って書いてるつもり。

出力結果

Id: 0
DetectedText:ife
Confidence: 85.33%
Type:LINE
Id: 1
DetectedText:is
Confidence: 96.76%
Type:LINE
Id: 2
DetectedText:eautiful
Confidence: 99.29%
Type:LINE
Id: 3
DetectedText:ohuafe
Confidence: 78.24%
Type:LINE
Id: 4
ParentId: 0
DetectedText:ife
Confidence: 85.33%
Type:WORD
Id: 5
ParentId: 1
DetectedText:is
Confidence: 96.76%
Type:WORD
Id: 6
ParentId: 2
DetectedText:eautiful
Confidence: 99.29%
Type:WORD
Id: 7
ParentId: 3
DetectedText:ohuafe
Confidence: 78.24%
Type:WORD

結構いい感じ？Lは画像が見切れてるから？bがわからなかったのは謎。
uがaに見えるらしい。言われてみれば。最後の行のOよくわかったな。自分でもわからん。

ケース3-2.　筆記体（L見切れてないver)

本当にLは見切れたから読めなかったのかもう一度検証

対象画像

出力結果

Id: 0
DetectedText:dife
Confidence: 96.99%
Type:LINE
Id: 1
DetectedText:is feoutifal
Confidence: 88.87%
Type:LINE
Id: 2
DetectedText:oheke
Confidence: 87.48%
Type:LINE
Id: 3
ParentId: 0
DetectedText:dife
Confidence: 96.99%
Type:WORD
Id: 4
ParentId: 1
DetectedText:is
Confidence: 95.39%
Type:WORD
Id: 5
ParentId: 1
DetectedText:feoutifal
Confidence: 82.34%
Type:WORD
Id: 6
ParentId: 2
DetectedText:oheke
Confidence: 87.48%
Type:WORD

確かにLはdにみえなくないかも。チャウワ。
beautifulは跡形もなくなり、isと行が同じだと行っている。チャウワ。
同じものを撮影したのに3-1よりも精度がなぜか下がるという結果に。

ケース4.　英語　プリント文字

対象画像

これなら全正解いけるやろ。

出力結果

Id: 0
DetectedText:notebook
Confidence: 97.32%
Type:LINE
Id: 1
DetectedText:subect:
Confidence: 93.45%
Type:LINE
Id: 2
ParentId: 0
DetectedText:notebook
Confidence: 97.32%
Type:WORD
Id: 3
ParentId: 1
DetectedText:subect:
Confidence: 93.45%
Type:WORD

subject外してもうた。

結論

手書きは実用的な精度ではない
日本語は読めない
@も読めない
筆記体は書き慣れてる人の字ならもうちょい精度あがるかも
行の読み取りは割と正確
活字でも個性ある字体はミスる

いじょ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

AWSの画像認識API Rekognitionをちょっと試してみた(手書きテキスト編)

概要

基本実装

ケース1. 日本語 + 英語

対象画像

出力結果

ケース2. 英語

対象画像

出力結果

ケース3-1. 筆記体

対象画像

出力結果

ケース3-2. 筆記体（L見切れてないver)

対象画像

出力結果

ケース4. 英語 プリント文字

対象画像

出力結果

結論

ケース1.　日本語 + 英語

ケース2.　英語

ケース3-1.　筆記体

ケース3-2.　筆記体（L見切れてないver)

ケース4.　英語　プリント文字