More than 3 years have passed since last update.

pythonで画像を切り出す

Posted at 2020-02-24

PythonとOCRを使って何かできないかということでいろいろと奮闘した記録
（OCRにはTesseract-OCRを使用）

まず、読み取る対象をまるっと1枚読ませてみると
　ハンコや写真みたいなものまで頑張って読み上げてくれた結果、意味がわからない
　読めてもデータとデータの切れ目がわからない
ということもあって使い物にならない。
そこで必要なところだけを切り出して読ませることにした。

以下のコードを書けば、画像ファイルから任意の場所を
切り出して、切り出した画像を別名で保存することができる。

   from PIL import Image
    
   # 画像をPILで開く
   img_trim = Image.open('元の画像ファイル名')
   # 指定した座標を切り出す
   img_trim.crop((x1, y1, x2, y2)).save('切り出した画像の保存名')

このままでは、ほしい部分を切り出すのに何度も座標を調整する必要がある。
こそで座標をしている作業を効率化するためにマウス操作で座標を取得するアプリを作ってみる。
C#やVBなどGUIを作る手段はいろいろとあるが今回はKivyというPythonのGUIを作る仕組みを試してみる。
インストール作業は、pipでKivyをインストールすればよかったはず。。。（詳細は省略）

UI部分の定義(main.kv):

#:import hex_color kivy.utils.get_color_from_hex
<ImageWidget>:
    canvas.before:
    Color:
        rgb: 1,1,1
    Rectangle:
        pos: self.pos
        size: self.size
BoxLayout:
    orientation: 'horizontal'
    height: root.height
    width: root.width

    Image:
        id: img
        allow_stretch: True
        source: root.image_src

    BoxLayout:
        size: root.size
        orientation: 'vertical'
        width: 200

        Label:
            id: lbl_file_name
            color: 0, 0, 0, 1
            font_size: 20
            background_color: hex_color('#000000')
        Label:
            id: lbl_result
            color: 0, 0, 0, 1
            font_size: 20

HTMLを簡略化したような書き方になっている
続いて本体のソース(main.py):

from kivy.app import App
from kivy.core.text import LabelBase, DEFAULT_FONT  # 追加分
from kivy.config import Config
from kivy.resources import resource_add_path  # 追加分
from kivy.properties import StringProperty
from kivy.uix.widget import Widget
from kivy.graphics import Line
from kivy.graphics import Color
from kivy.utils import get_color_from_hex
from PIL import Image
import math
import os
import pyocr
import pyocr.builders

resource_add_path('c:/Windows/Fonts')  # 追加分
LabelBase.register(DEFAULT_FONT, 'msgothic.ttc')  # 追加分

Config.set('graphics', 'width', '1224')
Config.set('graphics', 'height', '768')  # 16:9

class ImageWidget(Widget):
    image_src = StringProperty('')

def __init__(self, **kwargs):
    super().__init__(**kwargs)
    self.image_src = 'read_img/0112-3.png'
    self.ids.lbl_file_name.text = "ファイル名：\n{}".format(self.image_src)
    self.lines = []

def on_touch_down(self, touch):
    self.x1 = touch.x
    self.y1 = touch.y
    self.x2 = None
    self.y2 = None

def on_touch_move(self, touch):
    img = self.ids.img
    if touch.x > img.width:
        self.x2 = img.width
    else:
        self.x2 = touch.x
    if touch.y > img.height:
        self.y2 = 0
    else:
        self.y2 = touch.y

    for line in self.lines:
        self.canvas.remove(line)
    self.lines = []

    with self.canvas:
        # 赤線にするための設定
        Color(100, 0, 0)
        touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1,
                                        self.x2, self.y2, self.x1, self.y2],
                                close='True')
        self.lines.append(touch.ud['line'])

        # 破線にするための設定
        Color(1, 1, 1)
        touch.ud['line'] = Line(points=[self.x1, self.y1, self.x2, self.y1,
                                        self.x2, self.y2, self.x1, self.y2],
                                dash_offset=5, dash_length=3,
                                close='True')
        self.lines.append(touch.ud['line'])

def on_touch_up(self, touch):
    # touch_moveイベントが発生していないときは終了する
    if self.x2 is None:
        return

    # 初期化処理：
    # IMGオブジェクトを取得する
    img = self.ids.img
    # リサイズされた画像のサイズを求める：
    vs = img.norm_image_size
    # 画像をPILで開く
    img_trim = Image.open(self.image_src)
    # 画像のサイズを得る
    rs = img_trim.size

    # 画像の縮尺を算出する
    ratio = rs[0] / vs[0]

    # かかっているパディングの値を求める：
    # MEMO　中央ぞろえを前提としている（画像オブジェクトのサイズ - 表示サイズ） / 2で求めている
    px = 0
    py = 0
    if img.width > vs[0]:
        px = (img.width - vs[0]) / 2
    if img.height > vs[1]:
        py = (img.height - vs[1]) / 2

    # IMGオブジェクトからパディングを除く
    x1 = (self.x1 - px) * ratio
    x2 = (self.x2 - px) * ratio
    y1 = (img.height - self.y1 - py) * ratio
    y2 = (img.height - self.y2 - py) * ratio

    # 切り出し位置の座標のを小→大になるように並び替える
    if x1 < x2:
        real_x1 = math.floor(x1)
        real_x2 = math.ceil(x2)
    else:
        real_x1 = math.floor(x2)
        real_x2 = math.ceil(x1)
    if y1 < y2:
        real_y1 = math.floor(y1)
        real_y2 = math.ceil(y2)
    else:
        real_y1 = math.floor(y2)
        real_y2 = math.ceil(y1)

    # 指定した座標を切り出す
    img_trim.crop((real_x1, real_y1, real_x2, real_y2)).save('write_img/test.png')

    # 画像から文字を読む
    self.read_image_to_string()

def read_image_to_string(self):
    try:
        # 1.インストール済みのTesseractのパスを通す
        path_tesseract = r"C:\Program Files\Tesseract-OCR"
        if path_tesseract not in os.environ["PATH"].split(os.pathsep):
            os.environ["PATH"] += os.pathsep + path_tesseract

        # 1.OCRエンジンの取得
        tools = pyocr.get_available_tools()
        tool = tools[0]

        # 2.原稿画像の読み込み
        img = Image.open("write_img/test.png")

        # 3.ＯＣＲ実行
        builder = pyocr.builders.TextBuilder(tesseract_layout=6)
        result = tool.image_to_string(img, lang="jpn", builder=builder)

        self.ids.lbl_result.text = f"読み取り結果：\n{result}"
        print(result)
    except Exception as ex:
        print(ex)
        self.ids.lbl_result.text = f"読み取り結果：\n失敗"

class MainApp(App):
    def __init__(self, **kwargs):
        super(MainApp, self).__init__(**kwargs)
        self.title = 'てすと'

def build(self):
    return ImageWidget()

if __name__ == '__main__':
    app = MainApp()
    app.run()

プログラムの流れ：
　クリックしたときにクリックしたポイントを得る
　ドラッグしている間、四角の枠を描画し続ける
　クリックを解除したときに終端の座標を取得し
　画像を切り出して、OCRに画像を読ませるという処理を行っている

ポイント：
　画面上のクリックした座標は、元の画像に当てはめても使えない
　　対比を考えて実際の座標の計算を行う必要がある
　GUI上の画像オブジェクトにはパディングが入ることを考慮する必要がある
　マウスのドラッグ方向が左上に囲むことも考慮する必要がある

OCRの読み取りの結果だが、、、
なんの調整も行わないで実行するとあまり精度がよくない
パラメータを調整したりいろいろとやることがありそう。
そもそもOCRの性能的にムリというのもあるかもしれない
今後は、OCRのエンジン選びなどを行ってみたいと思う

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up