スキャンしたりFAXで受信した画像の向きが一定でない時にきちんと向きをあわせる方法を検討してみた

Last updated at 2025-05-30Posted at 2025-05-30

はじめに

色々な文書を複合機でスキャンし、png画像を保管しているのですが、受け取った資料の上下左右がバラバラで、これをOCRしようとすると読み取りミスが多くて困っていたので、スキャンした文書を回転させてきちんと上下をあわせる方法を検討してみました。

ちなみに、調べてみるとtesseract-odsで実現できるとのことでしたが、精度が悪くて思うような結果がでません。
これは縦書きに対応しているため、画像が時計回りに90度回転している状態と正しく上に向いている状態の結果が同じようになってしまうためかなと推察されます。

使うもの

Ubuntu 22.x
Python 3.x
tessor-ocr
OpenCV
PIL

環境設定

以下のコマンドでライブラリーをインストール

sudo apt install -y tesseract-ocr tesseract-ocr-jpn tesseract-ocr-osd python3-tesserocr

ソースコード

任意の角度で画像を回転させる関数を作成

def rotate_cv_image(image, angle):

    (h, w) = image.shape[:2]
    center = (w // 2, h // 2)
    matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
    # 回転後の画像サイズを考慮して拡大
    cos = np.abs(matrix[0, 0])
    sin = np.abs(matrix[0, 1])
    new_w = int((h * sin) + (w * cos))
    new_h = int((h * cos) + (w * sin))
    
    # 回転行列の中心を補正
    matrix[0, 2] += (new_w / 2) - center[0]
    matrix[1, 2] += (new_h / 2) - center[1]
    
    # 回転処理
    return cv2.warpAffine(image, matrix, (new_w, new_h), flags=cv2.INTER_LINEAR)

一番読み取れる方向を判定する関数を作成

def find_best_orientation_with_tesserocr(i, np_image, lang='jpn'):
    
    # tesserocrを使って最も単語を認識できる角度を判定
    best_angle = 0

    # 本文中に出現する可能性の高い単語を列挙する
    words = ["松江", "島根", "パソコン", ...]
    txt_lens = []

    with PyTessBaseAPI(lang=lang) as api:
        
        for angle in [0, 90, 180, 270]:
            
            rotated = rotate_cv_image(np_image, angle)
            pil = Image.fromarray(cv2.cvtColor(rotated, cv2.COLOR_BGR2RGB))
            api.SetImage(pil)

            # tessorocrの検出結果を取得
            text = api.GetUTF8Text()
            text = text.replace("\n", "").replace(" ", "")

            # 単語の出現回数を合計
            text_len = 0
            for word in words:
                text_len +=  text.count(word)
            txt_lens += [text_len]
            
            print(f"{angle}° 回転 → 認識単語数: {text_len}")
    
    # 検出数の平均値を算出して、それ以上か否かをパターンに変換
    avg = sum(txt_lens) / 4
    ptn = ""
    for txt_len in txt_lens:
        if txt_len > avg:
            ptn += "1"
        else:
            ptn += "0"
    
    print(ptn)

    # パターンに応じて回転する角度を割り出す
    if ptn == "1100":
        best_angle = 90
    elif ptn == "1001":
        best_angle = 0
    elif ptn == "0011":
        best_angle = 270
    else:
        best_angle = 180
    
    print(f"最適な回転角度: {best_angle}°")
    return best_angle

ここで、wordsの値を判定結果から追記していくことで精度が向上します。

まとめ

detect_and_ocr.py

import cv2
import numpy as np

import tesserocr
from tesserocr import PyTessBaseAPI
from PIL import Image

import glob
import os

def rotate_cv_image(image, angle):
  """ 省略 """
end

def find_best_orientation_with_tesserocr(i, np_image, lang='jpn'):
  """ 省略 """
end

if __name__ == "__main__":

    # ファイルの一覧を取得
    files = glob.glob("images/*.png")
    #print(files)

    i = 0
    
    for file in files:

        # 画像ファイルを読込
        print(file)
        img = cv2.imread(file)

        # 角度の検出
        angle = find_best_orientation_with_tesserocr(i, img)
        print(angle)

        # 回転させて保存
        img_rotated = rotate_cv_image(img, angle)
        cv2.imwrite(f"{file}_rotated.png", img_rotated)

        # 画像からテキストを抽出
        pil = Image.fromarray(cv2.cvtColor(img_rotated, cv2.COLOR_BGR2RGB))
        api.SetImage(pil)
        text = api.GetUTF8Text()
        print(text)
        
    exit()

実行

python3 detect_and_ocr.py

できた！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up