サマリ

Spatialのnote(付箋)が映った画像からテキストを認識して文字起こしします。

↑上のような画像から、↓下のようなテキストを生成します。

パンと
はん

-----
100年
目の真
実!!

-----
ヤフー
ニュースに
載ってた1

ちょこちょこ誤認識があります。。

動作確認環境

Windows 11 Home 22H2
python 3.11.1
opencv 4.7.0
Tesseract OCR 5.3.1.20230401

1. pythonのインストール

pythonでプログラムを記述します。
pythonをインストールします。
→python.org

使用するパッケージをインストールします。

pip install opencv-python
pip install pytesseract

2. Tesseract OCRのインストール

文字認識のため、Tesseract OCRをインストールします。
→Tesseract OCR

日本語パッケージもインストールします。
下記URLからダウンロードしてTesseract OCRインストールフォルダ配下のtessdataフォルダに格納します。
→jpn_vert.traineddata
→jpn.traineddata

3. pythonアプリの実行

下記のpythonアプリを実行します。

実行方法

文字起こししたい画像ファイル名を引数で渡します。画像ファイル名の拡張子をtxtに変更したファイルに文字起こしした結果を出力します。

> python recognize_sticky_notes.py Spatial_note_sample.jpg

プログラム

画像ファイルを読み込む。
HSV色空間に変換する。
黄色の範囲を定義し、画像から黄色い領域を抽出する。
黄色い領域のみの画像を作成する。
グレースケール画像に変換する。
二値化して白黒のバイナリ画像を作成する。
輪郭を検出する。
透視変換を使って各付箋紙の部分を切り出す。
pytesseractを使って切り出された付箋紙の画像から文字列を認識する。
認識した文字列を整形し、不要な半角スペースを削除する。
付箋ごとに"-----"で区切り、テキストファイルに出力する。

import cv2
import pytesseract
import sys
import os
import numpy as np

def order_points(pts):
    rect = np.zeros((4, 2), dtype="float32")
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect

def four_point_transform(image, pts):
    rect = order_points(pts)
    (tl, tr, br, bl) = rect
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA), int(widthB))
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA), int(heightB))
    dst = np.array([
        [0, 0],
        [maxWidth - 1, 0],
        [maxWidth - 1, maxHeight - 1],
        [0, maxHeight - 1]], dtype="float32")
    M = cv2.getPerspectiveTransform(rect, dst)
    warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
    return warped

def recognize_sticky_notes(image_file):
    # 画像を読み込む
    image = cv2.imread(image_file)

    # HSV色空間に変換
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

    # 黄色の領域を抽出するための範囲を設定
    lower_yellow = np.array([20, 100, 100])
    upper_yellow = np.array([30, 255, 255])

    # 黄色の領域を抽出
    mask = cv2.inRange(hsv_image, lower_yellow, upper_yellow)

    # マスクを適用
    masked_image = cv2.bitwise_and(image, image, mask=mask)

    # グレースケールに変換
    gray_image = cv2.cvtColor(masked_image, cv2.COLOR_BGR2GRAY)

    # 二値化
    _, binary_image = cv2.threshold(gray_image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    # 輪郭検出
    contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # 文字認識の結果を格納するリスト
    recognized_texts = []

    for contour in contours:
        # 外接矩形の座標を取得
        rect = cv2.minAreaRect(contour)
        box = cv2.boxPoints(rect)
        box = np.intp(box)

        # 透視変換
        warped = four_point_transform(gray_image, box)

        # 文字認識
        custom_config = '--oem 3 --psm 6'
        text = pytesseract.image_to_string(warped, lang='jpn', config=custom_config)

        # 認識したテキストをリストに追加
        recognized_texts.append(text)

    return recognized_texts

def remove_unnecessary_spaces(text):
    lines = text.split('\n')
    cleaned_lines = []
    for line in lines:
        cleaned_line = ''.join(line.split())
        cleaned_lines.append(cleaned_line)
    return '\n'.join(cleaned_lines)

def main():
    if len(sys.argv) != 2:
        print('Usage: python recognize_sticky_notes.py <image_file>')
        return

    image_file = sys.argv[1]

    # 付箋紙の文字列を認識
    recognized_texts = recognize_sticky_notes(image_file)

    # 出力ファイル名を作成
    output_file = os.path.splitext(image_file)[0] + '.txt'

    # ファイルに書き出す
    with open(output_file, 'w', encoding='utf-8') as f:
        for index, text in enumerate(recognized_texts):
            cleaned_text = remove_unnecessary_spaces(text)
            if cleaned_text.strip():
                f.write(cleaned_text + '\n')
                if index < len(recognized_texts) - 1:
                    f.write('-----\n')

if __name__ == '__main__':
    main()

チームハイエナ

【VR空間サービス Spatial.io】#8 noteから文字起こし

サマリ

動作確認環境

目次

1. pythonのインストール

2. Tesseract OCRのインストール

3. pythonアプリの実行