More than 1 year has passed since last update.

Google Vision APIで業務効率化！

Last updated at 2023-03-05Posted at 2023-01-05

こんにちは、google vision APIを使って画像中の文字を読み取り、数量をカウントする方法を記事にします。
普段の業務で、業者から大量に事務用品が届き、納品チェックで製品名やサイズ、型番を一つ一つチェックしており、とても時間がかかっていました。この作業をAI-OCRで置き換えられないかと考え試してみました。

Google Vision APIの使い方

とっても丁寧に解説されていますので、こちらをご参照ください。
秘密鍵のJSONファイルまでゲットします。
https://self-development.info/python%E3%81%A7google-cloud-vision-api%E3%82%92%E5%88%A9%E7%94%A8%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95/

AI-OCRで画像中のテキストを読み取る

画像はこんなものを用意しました。

#Google colabへマウント
from google.colab import drive
drive.mount('/content/drive')

import io
import os
import glob
# 以下実行後、リスタートが必要
!pip install --upgrade google-cloud-vision| tail -n -1
# Imports the Google Cloud client library
from google.cloud import vision

#JSONファイルのパスを指定する
cred_path = 'JSON　pathを入力してください'
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = cred_path
#clientの初期化
client = vision.ImageAnnotatorClient()

#ディレクトリ内の画像ファイルのリストを取得する
path ='写真を置いてあるディレクトリ'
src_files = glob.glob('path*.jpg')
src_files.sort()

for file_name in src_files:
  
#対象画像の読み込み
    with io.open(file_name, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

#APIに投げる
#{'language_hints': ['ja']}の意味は日本語で返してねということ
    response =  client.document_text_detection(
            image=image,
            image_context={'language_hints': ['ja']}
        )
    print(response.text_annotations[0].description)

出力結果

けっこう、きれいに読み取れています！！これをデータフレームに整形して、数量をカウントします。

# txtファイルに出力
f = open('output.txt', 'w', encoding='UTF-8')
f.write(response.text_annotations[0].description)
f.close()

# readlineメソッドを使ってテキストファイルから1行ずつ内容を読み込む
with open('output.txt', encoding="utf-8") as f:
    lst = []
    for line in f:
        line = line.rstrip()  # 読み込んだ行の末尾には改行文字があるので削除
        lst.append(line)

#DataFrame形式にする
import pandas as pd
df = pd.DataFrame(lst)

#OCRで読み取った結果のカウント
product_num = df[0].str.contains("製品名").sum()
size_num = df[0].str.contains("50 x 100").sum() + df[0].str.contains("50×100 ").sum()
type_num = df[0].str.contains("AAA-0000").sum()

print("製品名:", product_num)
print("サイズ　50 X 100:",size_num)
print("型番　AAA-0000:", type_num)

出力結果

製品名: 9
サイズ　50 X 100: 9
型番　AAA-0000: 9

最後に、発注数量と納品数量が一致しているか判定しましょう

order_num =9 #ここに発注数量を入力

if order_num == product_num and order_num == size_num and order_num  == type_num:
  print("納品数は発注数量と一致しています")
else:
  print("納品数は発注数量と異なります")

出力結果

納品数は発注数量と一致しています

や、やった～！できた～！！　初学者でも業務効率化にpythonが活かせました。AI-OCRにも限界はあると思いますが、簡単な数量チェックには十分そうです。

おまけ　AI-OCRでどこを読み取っているのか可視化する

google vision APIの公式ドキュメントを参考に、画像中のどこを読み取っているのかバウンディングボックスで可視化してみました。今回の画像のように、文字が全てきれいに見えていたらいいのですが、実際は物が重なっていたり、見えにくいこともありますよね。そのような場合に、AI-OCRの読み取りが苦手なものの確認に使えると思います。

from enum import Enum

class FeatureType(Enum):
    PAGE = 1
    BLOCK = 2
    PARA = 3
    WORD = 4
    SYMBOL = 5

def draw_boxes(input_file, bounds):
    img = cv2.imread(input_file, cv2.IMREAD_COLOR)
    for bound in bounds:
      p1 = (bound.vertices[0].x, bound.vertices[0].y) # top left
      p2 = (bound.vertices[1].x, bound.vertices[1].y) # top right
      p3 = (bound.vertices[2].x, bound.vertices[2].y) # bottom right
      p4 = (bound.vertices[3].x, bound.vertices[3].y) # bottom left
      cv2.line(img, p1, p2, (0, 255, 0), thickness=1, lineType=cv2.LINE_AA)
      cv2.line(img, p2, p3, (0, 255, 0), thickness=1, lineType=cv2.LINE_AA)
      cv2.line(img, p3, p4, (0, 255, 0), thickness=1, lineType=cv2.LINE_AA)
      cv2.line(img, p4, p1, (0, 255, 0), thickness=1, lineType=cv2.LINE_AA)
    return img

def get_document_bounds(response, feature):
    document = response.full_text_annotation
    bounds = []
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if (feature == FeatureType.SYMBOL):
                          bounds.append(symbol.bounding_box)
                    if (feature == FeatureType.WORD):
                        bounds.append(word.bounding_box)
                if (feature == FeatureType.PARA):
                    bounds.append(paragraph.bounding_box)
            if (feature == FeatureType.BLOCK):
                bounds.append(block.bounding_box)
    return bounds

bounds = get_document_bounds(response, FeatureType.BLOCK)
img_block = draw_boxes(input_file, bounds)

bounds = get_document_bounds(response, FeatureType.PARA)
img_para = draw_boxes(input_file, bounds)

bounds = get_document_bounds(response, FeatureType.WORD)
img_word = draw_boxes(input_file, bounds)

bounds = get_document_bounds(response, FeatureType.SYMBOL)
img_symbol = draw_boxes(input_file, bounds)

plt.figure(figsize=[30,30])
# plt.subplot(141);plt.imshow(img_block[:,:,::-1]);plt.title("img_block")
# plt.subplot(142);plt.imshow(img_para[:,:,::-1]);plt.title("img_para")
plt.subplot(143);plt.imshow(img_word[:,:,::-1]);plt.title("img_word")
#plt.subplot(144);plt.imshow(img_symbol[:,:,::-1]);plt.title("img_symbol")

google vision API 素晴らしいです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Google Vision APIで業務効率化！

Google Vision APIの使い方

AI-OCRで画像中のテキストを読み取る

おまけ AI-OCRでどこを読み取っているのか可視化する

おまけ　AI-OCRでどこを読み取っているのか可視化する