9
19

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

OCRで画像から文字列を抽出

Posted at

sample

import cv2
import base64
import numpy as np
from PIL import Image
import pyocr

class SampleClass:
    """Sample class for OCR
    This code use tesseract-ocr
    """

    def scan(self, reqdata):
        """画像から文字列を抽出
        :param reqdata APIリクエストで画像ファイルデータを受け取る
        :return: 抽出した文字列をJSONで返却
        """
        # 保存先の画像ファイルパスを指定
        image_file=f'ファイルパス'
        # APIリクエストされた画像ファイルをbase64
        img_binary = base64.b64decode(reqdata)
        # バッファーを配列に変換
        jpg=np.frombuffer(img_binary,dtype=np.uint8)
        # デコードと書き出し
        img = cv2.imdecode(jpg, cv2.IMREAD_COLOR)
        cv2.imwrite(image_file,img)

        # OCRエンジンで画像を文字列に変換
        tesseract = pyocr.get_available_tools()[0]
        res = tesseract.image_to_string(Image.open(image_file),
            lang="jpn",builder=pyocr.builders.TextBuilder(tesseract_layout=6))

        # 返す
        return 200, {'message': 'ok', 'code': '0', 'data': res}

cv2で画像の読み取り、切り取りはこちらのようにもできる

画像をグレースケールに変える等はこちら

より詳しい使い方等はこちらのQiita記事がまとまっていました。

CentOSでOCRエンジンを入れる

# リポジトリを追加
yum-config-manager --add-repo 
# 公開鍵を追加
https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/

sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
yum update
yum install -y tesseract
yum install -y tesseract-langpack-jpn

library

python3 -m pip install pyocr
python3 -m pip install opencv-python

pyocr
opencv-python

9
19
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
9
19

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?