More than 1 year has passed since last update.

株式会社プロトソリューション

画像内の文字を抽出してみた件について

Last updated at 2022-11-25Posted at 2022-11-25

はじめに

この記事はPaddleOCRやEasyOCRを使ってテキスト抽出する方法を紹介します。

各ライブラリやOCR、~~超解像度化~~の詳しい内容の説明はこちらの記事では省かせてもらいます。

各ライブラリについては以下を参考にいたしました。

環境構築

Google Colaboratory

1.各ライブラリのインストール

PaddleOCR
PaddleOCRのクイックスタートに沿ってインストールします。

!python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
!pip install "paddleocr>=2.0.1"

EasyOCR
EasyOCRもインストールします。

!pip install easyocr

テキスト抽出

環境構築が終わったら、テキスト抽出に入っていこうと思います。
使用する画像は、マイカーが故障した時に撮ったメータが写っているものにしました。

それでは、テキスト抽出していきましょう！！

1.テキスト検出

OCRは最初に画像内にあるテキストを検出して、その後テキスト認識を行います。
ここからテキスト検出の動作確認をしていきます。

PaddleOCRとEasyOCRで画像のテキスト検出後、比較していきます。

まず、Google Colaboratoryで以下のようにファイルを開いて画像をアップロードしてください。

PaddleOCR

公式のリポジトリにあるコードをほぼコピペしてimage_pathを変更しました。
※ここではテキスト検出の結果を比較するため、あえてBounding Boxの値のみを出力させています。

from paddleocr import PaddleOCR,draw_ocr

ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
#画像パスの指定
img_path = '/content/IMG-2915.jpg'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line[0])

結果は文字検出したBounding Boxの値が返されます。

[[2716.0, 279.0], [2824.0, 279.0], [2824.0, 330.0], [2716.0, 330.0]]
[[2760.0, 285.0], [2937.0, 300.0], [2930.0, 386.0], [2752.0, 370.0]]
[[1884.0, 531.0], [1964.0, 531.0], [1964.0, 632.0], [1884.0, 632.0]]
[[1638.0, 624.0], [1715.0, 624.0], [1715.0, 725.0], [1638.0, 725.0]]
[[2152.0, 624.0], [2217.0, 624.0], [2217.0, 717.0], [2152.0, 717.0]]
[[2302.0, 837.0], [2363.0, 837.0], [2363.0, 927.0], [2302.0, 927.0]]
[[1504.0, 857.0], [1554.0, 857.0], [1554.0, 930.0], [1504.0, 930.0]]
[[2128.0, 962.0], [2236.0, 955.0], [2242.0, 1041.0], [2134.0, 1048.0]]
[[3192.0, 1093.0], [3238.0, 1093.0], [3238.0, 1120.0], [3192.0, 1120.0]]
[[779.0, 1154.0], [1095.0, 1173.0], [1089.0, 1269.0], [773.0, 1251.0]]
[[2183.0, 1210.0], [2302.0, 1210.0], [2302.0, 1353.0], [2183.0, 1353.0]]
[[2083.0, 1344.0], [2211.0, 1354.0], [2207.0, 1413.0], [2078.0, 1402.0]]

可視化コード

# draw result
from PIL import Image
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
im_show = draw_ocr(image, boxes)
im_show = Image.fromarray(im_show)
im_show

可視化

EasyOCR

公式のリポジトリ内のGoogle Colaboratoryにあるコードをほぼコピペしました。

import easyocr
reader = easyocr.Reader(['ch_sim','ja'])
bounds = reader.readtext('/content/IMG-2915.jpg')
#reader.detect("画像パス")でもBounding Boxの値は取得できます。
# det = reader.detect("/content/IMG-2915.jpg")
for idx in range(len(bounds)):
    res = bounds[idx][0]
    print(res)

結果

[[2709, 277], [2820, 277], [2820, 332], [2709, 332]]
[[2768, 314], [2922, 314], [2922, 379], [2768, 379]]
[[1894, 541], [1961, 541], [1961, 617], [1894, 617]]
[[1639, 632], [1715, 632], [1715, 722], [1639, 722]]
[[2149, 627], [2230, 627], [2230, 719], [2149, 719]]
[[1493, 839], [1565, 839], [1565, 929], [1493, 929]]
[[2295, 837], [2374, 837], [2374, 926], [2295, 926]]
[[2889, 887], [2919, 887], [2919, 919], [2889, 919]]
[[2130, 967], [2236, 967], [2236, 1010], [2130, 1010]]
[[2135, 1005], [2234, 1005], [2234, 1037], [2135, 1037]]
[[2769, 1040], [2938, 1040], [2938, 1088], [2769, 1088]]
[[3152, 1080], [3264, 1080], [3264, 1129], [3152, 1129]]
[[775, 1172], [1091, 1172], [1091, 1258], [775, 1258]]
[[2713, 1176], [2753, 1176], [2753, 1225], [2713, 1225]]
[[2084, 1351], [2208, 1351], [2208, 1402], [2084, 1402]]
[[3287, 1483], [3379, 1483], [3379, 1569], [3287, 1569]]

可視化コード

# Draw bounding boxes
def draw_boxes(image, bounds, color='yellow', width=2):
    draw = ImageDraw.Draw(image)
    for bound in bounds:
        p0, p1, p2, p3 = bound[0]
        draw.line([*p0, *p1, *p2, *p3, *p0], fill=color, width=width)
    return image
img_path = '/content/IMG-2915.jpg'
image = Image.open(img_path).convert('RGB')
draw_boxes(image, bounds)

可視化

結果の比較

テキスト検出結果の画像を並べて比較してみました。
個人的にはEasyOCRの方がテキスト検出の精度が良いと思います（視覚的にみて）。

PaddleOCR	EasyOCR

2.テキスト認識/抽出

テキスト検出が終わったら、次はテキスト認識を行っていきます。
PaddleOCRとEasyOCRのコードは以下になります。
PaddleOCR

img_path = '/content/IMG-2915.jpg'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        #テキスト検出コードの[0]を取り除きました
        print(line)

[[[2716.0, 279.0], [2824.0, 279.0], [2824.0, 330.0], [2716.0, 330.0]], ('TRIP', 0.9965751767158508)][[[2760.0, 285.0], [2937.0, 300.0], [2930.0, 386.0], [2752.0, 370.0]], ('10.06g', 0.6891794204711914)][[[1884.0, 531.0], [1964.0, 531.0], [1964.0, 632.0], [1884.0, 632.0]], ('4', 0.9998847246170044)][[[1638.0, 624.0], [1715.0, 624.0], [1715.0, 725.0], [1638.0, 725.0]], ('3', 0.9989321827888489)][[[2152.0, 624.0], [2217.0, 624.0], [2217.0, 717.0], [2152.0, 717.0]], ('5', 0.9994152784347534)][[[2302.0, 837.0], [2363.0, 837.0], [2363.0, 927.0], [2302.0, 927.0]], ('6', 0.9945863485336304)][[[1504.0, 857.0], [1554.0, 857.0], [1554.0, 930.0], [1504.0, 930.0]], ('2', 0.993301510810852)][[[2128.0, 962.0], [2236.0, 955.0], [2242.0, 1041.0], [2134.0, 1048.0]], ('xioon', 0.6005752086639404)][[[3192.0, 1093.0], [3238.0, 1093.0], [3238.0, 1120.0], [3192.0, 1120.0]], ('nv', 0.6508363485336304)][[[779.0, 1154.0], [1095.0, 1173.0], [1089.0, 1269.0], [773.0, 1251.0]], ('79264km', 0.9745035767555237)][[[2183.0, 1210.0], [2302.0, 1210.0], [2302.0, 1353.0], [2183.0, 1353.0]], ('0', 0.9729198217391968)][[[2083.0, 1344.0], [2211.0, 1354.0], [2207.0, 1413.0], [2078.0, 1402.0]], ('km/h', 0.9934272170066833)]

EasyOCR

reader = easyocr.Reader(['ch_sim','ja'])
bounds = reader.readtext('/content/IMG-2915.jpg')
for idx in range(len(bounds)):
    #テキスト検出コードの[0]を取り除きました
    res = bounds[idx]
    print(res)

([[2709, 277], [2820, 277], [2820, 332], [2709, 332]], 'TRIP', 0.9966885447502136), ([[2768, 314], [2922, 314], [2922, 379], [2768, 379]], '0.065', 0.5249832969030725), ([[1894, 541], [1961, 541], [1961, 617], [1894, 617]], '4', 0.9998302531757872), ([[1639, 632], [1715, 632], [1715, 722], [1639, 722]], '3', 0.999896767419628), ([[2149, 627], [2230, 627], [2230, 719], [2149, 719]], '5', 0.9999892711927174), ([[1493, 839], [1565, 839], [1565, 929], [1493, 929]], '2', 0.9997789981889724), ([[2295, 837], [2374, 837], [2374, 926], [2295, 926]], '6', 0.9956494307905963), ([[2889, 887], [2919, 887], [2919, 919], [2889, 919]], '', 0.0), ([[2130, 967], [2236, 967], [2236, 1010], [2130, 1010]], 'x1000', 0.5686412231606381), ([[2135, 1005], [2234, 1005], [2234, 1037], [2135, 1037]], 'Tmin', 0.9345958232879639), ([[2769, 1040], [2938, 1040], [2938, 1088], [2769, 1088]], 'OP', 1.4110881702323698e-05), ([[3152, 1080], [3264, 1080], [3264, 1129], [3152, 1129]], 'KmL', 0.5518514306622359), ([[775, 1172], [1091, 1172], [1091, 1258], [775, 1258]], "!325'Skm", 0.19547144242444303), ([[2713, 1176], [2753, 1176], [2753, 1225], [2713, 1225]], 'E', 0.4962320153936304), ([[2084, 1351], [2208, 1351], [2208, 1402], [2084, 1402]], 'kmrh', 0.40424832701683044), ([[3287, 1483], [3379, 1483], [3379, 1569], [3287, 1569]], '@', 0.43320930567682936)

テキスト認識/抽出結果の比較

例として抽出したテキストと実際のテキストで比較すると、、、

実際のテキスト	PaddleOCR	EasyOCR
'TRIP'	'TRIP'	'TRIP'
'79264km'	'79264km'	'!325'Skm'
'km/h'	''km/h''	'kmrh'

↑のような結果になり、PaddleOCRはEasyOCRよりもテキスト認識の予測確信度が高い推論結果が多くあった。

よって、テキスト認識の比較はPaddeOCRの方が精度は良いと思いました。

👾（PaddleOCRとEasyOCRの融合）

PaddleOCRとEasyOCRの「いいとこどり」をしたらどうなるのか気になったので検証してみました。

具体的には「PaddleOCRのテキスト認識」と「EasyOCRのテキスト検出」を組み合わせてテキスト抽出の検証を行いました。

コードは以下になります。

import cv2
from google.colab.patches import cv2_imshow
import easyocr
from paddleocr import PaddleOCR,draw_ocr

ocr = PaddleOCR(use_angle_cls=True, lang='en')
reader = easyocr.Reader(['ch_sim','en'])
 
im1 = cv2.imread('/content/IMG-2915.jpg')
img_r = cv2.cvtColor(im1, cv2.COLOR_BGR2RGB)
im_np_detection = im1.copy()
im_height, im_width,_= im_np_detection.shape

#Bounding Boxの値のみ取得
bounds2 = reader.detect(img_path)

for j in range(len(bounds2[0][0][:])):
  left, right, top, bottom = bounds2[0][0][j]

  x1 = min(max(0, int(left)), im_width)
  x2 = min(max(0, int(right)), im_width)
  y1 = min(max(0, int(top)), im_height)
  y2 = min(max(0, int(bottom)), im_height)
  #画像内のBounding Boxに囲まれている部分を取得
  crop = im_np_detection[y1:y2, x1:x2]
  #テキスト認識だけの機能を使う
  test = ocr.ocr(crop, det=False, cls=False)
  print(test)

結果

[[('TRIP', 0.9968887567520142)]][[('0.0', 0.9977871775627136)]][[('3', 0.9992127418518066)]][[('5', 0.9996463060379028)]][[('6', 0.301425576210022)]][[('BC', 0.5820157527923584)]][[('x1000', 0.9536186456680298)]][[('r/min', 0.8767522573471069)]][[('an', 0.22734510898590088)]][[('Ta', 0.40654659271240234)]][[('n/L', 0.9540947079658508)]][[('79264km', 0.9865780472755432)]][[('E', 0.994417667388916)]][[('km/h', 0.9904864430427551)]][[('O', 0.46661376953125)]]

融合前結果と比較

EasyOCR単体の結果と融合後の結果を比較してみました。

実際のテキスト	EasyOCR	融合後	予測確信度(EasyOCR)	予測確信度(融合後)
'TRIP'	'TRIP'	'TRIP'	0.9966885447502136	0.9968887567520142
'79264km'	'!325'Skm'	'79264km'	0.19547144242444303	0.9865780472755432
'km/h'	'kmrh'	'km/h'	0.40424832701683044	0.9904864430427551
'x1000'	'x1000'	'x1000'	0.5686412231606381	0.9536186456680298

結果はテキスト検出したすべての画像に対して、テキスト認識の精度と予測確信度が向上しました。

終わりに

ここまで読んでくださり、本当にありがとうございます！！！
OCRで何ができるかざっくりとイメージできたと思います。

今回扱った画像は好条件で撮影されたものだったので、テキスト検出からテキスト認識/抽出までスムーズに進めました。
「斜めや暗い場所で撮影された画像」については、角度検知やトーンカーブで前処理するとテキスト認識の精度は良くなってくると思います。

本当はテキスト検出した画像を超解像度化してテキスト認識/抽出まで検証したかったのですが、ここまでにしておきます。。。

また次回の投稿まで(^^)/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up