More than 1 year has passed since last update.

Surya(オープンソースのOCR)を試してみた

Posted at 2024-01-16

Suryaとは

オープンソースで提供される多言語ドキュメントOCR
※2024/1/16時点では文字検出のみリリース
　文字認識は今後リリース予定とのこと
特にテーブルとチャートの検出が個人的には楽しみ

サポート: 開発者: Vik Paruchuri
ライセンス:
GNU General Public License v3.0

動作環境

開発環境: Anaconda Prompt
pythonバージョン：3.9
cuda : 11.8
pytorch

ルートディレクトリ：

下記は、githubリポジトリとは別で独自で用意してます。
pip.txt:入れたライブラリのメモ
main.py :Suryaの実行スクリプト

ライブラリ

pip install surya_ocr

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install matplotlib

コード全文

from PIL import Image,ImageDraw
import matplotlib.pyplot as plt
from surya.detection import batch_inference
from surya.model.segformer import load_model, load_processor

# 画像に対してバウンディングボックスを描画する関数
def draw_boxes(image, boxes):
    draw = ImageDraw.Draw(image)
    for box in boxes:
        draw.rectangle(box, outline="red", width=2)
    return image

IMAGE_PATH = "画像のpath"

image = Image.open(IMAGE_PATH)
model, processor = load_model(), load_processor()

# predictions is a list of dicts, one per image
predictions = batch_inference([image], model, processor)

print(predictions)

# バウンディングボックスの座標を取得
boxes = predictions[0]['bboxes']

# 画像にバウンディングボックスを描画
image_with_boxes = draw_boxes(image.copy(), boxes)

# 描画した画像を表示
plt.imshow(image_with_boxes)
plt.show()

検出結果

グラボの使用率は2GBくらい

まとめ

・言語ごとのモデルではなく、多言語対応なところはGood
・横文字のドキュメント記事（サイトニュースや論文）に特化しているため、写真や画像から文字を取得したい場合はCraftモデルのほうがいいかもしれない

・文字認識、表・チャート検出の対応に期待したい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up