More than 5 years have passed since last update.

python OCRで画像の文字読み取り

Last updated at 2020-07-06Posted at 2020-07-06

tesseractのインストール

$ brew install tesseract

tessetacを動かすライブラリをインストール

$ pip3 install pyocr

日本語読み取り設定

$ curl -L -o /usr/local/share/tessdata/jpn.traineddata 'https://github.com/tesseract-ocr/tessdata/raw/master/jpn.traineddata'

$ tesseract --list-langs

List of available languages (4):
eng
jpn
osd
snum

OCR実装

from PIL import Image
import sys
import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
# The tools are returned in the recommended order of usage
tool = tools[0]

txt = tool.image_to_string(
    Image.open('{path}'),
    lang="jpn",
    builder=pyocr.builders.TextBuilder(tesseract_layout=6)
)
print(txt)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up