More than 5 years have passed since last update.

Pythonで数独認識して解く② (数字認識)

Last updated at 2020-09-12Posted at 2020-09-12

初めに

前回(Pythonで数独認識して解く①) の続編です
数独を認識して解くプログラムを作る
次は文字を認識します

[環境]
・Python 3.7.3
・numpy 1.16.4

方法①

CNN(畳み込みニューラルネットワーク)でMNISTデータセットを学習
PyTorchを用いたプログラムを作成
細かいことは以前の記事(こちら)を参照お願いします

結果①

うまくいかなかった
空白を含めた81マスをすべて正確に識別するモデルを作れなかった
うまくネットワークを構築できれば活用できるかもしれない

方法②

Tesseractをpyocrから利用した
Tesseractダウンロードページからダウンロード
インストーラをダウンロードし，順に進めるだけ。
日本語の認識を行いたい場合はAdditional script data (download)で追加が必要
インストール後，Pathを通す必要がある。(久しぶりだったので少し手間取った)

import numpy as np
from PIL import Image,ImageOps
import pyocr                    # OCR ラッパーライブラリ 対応OCR:Tesseract, Cuneiform
import pyocr.builders           # OCR ラッパーライブラリ 対応OCR:Tesseract, Cuneiform
import sys                      # 実行環境関連ライブラリ

"""
数字認識
"""
N = 9
number_detection = np.zeros((N,N),dtype=np.int)

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)

tool = tools[0]

for i in range(N*N):
    img_path = "split_sudoku/{}.jpg".format(i)
    img = Image.open(img_path)
    img = ImageOps.invert(img)
    txt = tool.image_to_string(
            img,
            lang="eng",
            builder=pyocr.builders.DigitBuilder(tesseract_layout=6))
    
    if txt != "":
        number_detection[i//N,i%N] = int(txt)

print(number_detection)
print()

補足(パラメータについて)

lang = "~~"
builder = pyocr.builders.~~
tesseract_layout = ~~

lang	種類
"eng"	英語
"jpn"	日本語
その他	割愛

今回はlang="eng"とした。(の本後よりも英語のほうが選択肢が少なくて良いみたい)

pyocr.builders	用法
TextBuilder	文字列を認識
WordBoxBuilder	単語単位で文字認識 + BoundingBox
LineBoxBuilder	行単位で文字認識 + BoundingBox
DigitBuilder	数字 / 記号を認識
DigitLineBoxBuilder	数字 / 記号を認識 + BoundingBox

tesseract_layout	用法
0	Orientation and script detection (OSD) only.
1	Automatic page segmentation with OSD.
2	Automatic page segmentation, but no OSD, or OCR
3	Fully automatic page segmentation, but no OSD. (Default)
4	Assume a single column of text of variable sizes.
5	Assume a single uniform block of vertically aligned text.
6	Assume a single uniform block of text.
7	Treat the image as a single text line.
8	Treat the image as a single word.
9	Treat the image as a single word in a circle.
10	Treat the image as a single character.

(OSD : サブタイトルなどの認識に利用?)

builder=pyocr.builders.DigitBuilder(tesseract_layout=6))とした。

結果②

結構うまくいく。
体感での成功率70%~80%
81マスすべてを完璧に認識するのはさすがに厳しい

まとめ

OCRはすごい
インストールするのはちょっとめんどくさい
CNNとかのネットワークで高性能なものを作るのは難易度高い

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up