More than 3 years have passed since last update.

【Python】OpenCVとpyocrで画像から文字を認識してみる

Last updated at 2020-05-17Posted at 2020-05-17

はじめに

Seleniumを利用する記事を探していると、ちょいちょい寿司打自動化の記事を見つけた。
手法としては基本的に以下のような感じ
・ゲームをスタートしたら全てのキーを入力し続ける
・ゲームをスタートしたらスクショをとりOCRで取得した文字列を入力
※寿司打はゲーム画面がCanvas要素に描画されているので直接文字列を取得できない

今回はOCR部分と事前処理としてOpenCVを使った簡単な画像処理を試してみた

事前準備

tesseractのインストール

tesseractはOCRエンジンです。
今回はこのOCRエンジンをpythonのpyocrモジュールで動かします
インストールは以下のコマンドで完了

$ brew install tesseract

このままだと日本語用のテストデータがないので以下のURLからダウンロード
https://github.com/tesseract-ocr/tessdata
↑このURLからjpn.traineddataを，/usr/local/share/tessdata/にダウンロード

pyocrとOpenCVのインストール

ターミナルで以下のコマンドを実行すれば完了

$ pip3 install pyocr
$ pip3 install opencv-python

とりあえずOCRしてみる

画像の準備

テスト用の画像は以下

↓トリミング

トリミングしたものをtest.pngという名前で保存

pyocrでOCR

import cv2
import pyocr
from PIL import Image
image = "test.png"

img = cv2.imread(image)
tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]
res = tool.image_to_string(
    Image.open("test.png")
    ,lang="eng")

print(res)

実行結果

全く正しく認識されてない…
やっぱり事前処理が必要そうだなぁ

OpenCVを触ってみる

OpenCVで事前処理をしたいが、OpenCVもはじめてなので遊んでみる
自分のアイコン画像を処理してみる

import sys
import cv2
import pyocr
import numpy as np
from PIL import Image
image = "test_1.png"
name = "test_1"

# original
img = cv2.imread(image)

# gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"1_{name}_gray.png",img)

# goussian
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite(f"2_{name}_gaussian.png",img)

# threshold
img = cv2.adaptiveThreshold(
    img
    , 255
    , cv2.ADAPTIVE_THRESH_GAUSSIAN_C
    , cv2.THRESH_BINARY
    , 11
    , 2
)
cv2.imwrite(f"3_{name}_threshold.png",img)

処理過程での画像はこんな感じ

OpenCV + OCR

先程OCRで使用した画像をOpenCVで事前処理して再度OCRを実行してみる
以下では事前処理としてグレースケール→閾値処理→色反転をしている

import sys
import cv2
import pyocr
import numpy as np
from PIL import Image
image = "test.png"
name = "test"

# original
img = cv2.imread(image)
cv2.imwrite(f"1_{name}_original.png",img)

# gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"2_{name}_gray.png",img)

# threshold
th = 140
img = cv2.threshold(
    img
    , th
    , 255
    , cv2.THRESH_BINARY
)[1]
cv2.imwrite(f"3_{name}_threshold_{th}.png",img)

# bitwise
img = cv2.bitwise_not(img)
cv2.imwrite(f"4_{name}_bitwise.png",img)

cv2.imwrite("target.png",img)

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]
res = tool.image_to_string(
    Image.open("target.png")
    ,lang="eng")

print(res)

実行結果

うまく認識できてそう！
一旦今回はここまでで終わり

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up