More than 5 years have passed since last update.

GCPのCloud Vision APIの使い方

Last updated at 2020-05-20Posted at 2020-05-20

ドキュメントが少し分かりにくかったのでまとめてみる。

APIのパラメータ

Type
maxResults
model

GCPのCloud Vision APIのTYPEには二種類ある。

テキスト検出「TEXT_DETECTION」（大きな画像内のテキストのスパース領域向けに最適化されている）
ドキュメントテキスト検出「DOCUMENT_TEXT_DETECTION」（高密度テキストに適している）

OCRの出力の構造はどちらも、

TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol

になっている

必要なものをインポート

.py

import base64
import json
from requests import Request, Session
from io import BytesIO
from PIL import Image
import numpy as np

APIキーの獲得

Google Cloud Vision APIのOCRを使ってPythonから文字認識する方法

APIの使い方

.py

def recognize_image1(input_image):#最後にstr_encode_fileに変える

    #pathからbase64にする場合
    def pil_image_to_base64(img_path):
        pil_image = Image.open(img_path)
        buffered = BytesIO()
        pil_image.save(buffered, format="PNG")
        str_encode_file = base64.b64encode(buffered.getvalue()).decode("utf-8")
        return str_encode_file

    #arrayからbase64にする場合
    def array_to_base64(img_array):
        pil_image = Image.fromarray(np.uint8(img_array))
        buffered = BytesIO()
        pil_image.save(buffered, format="PNG")
        str_encode_file = base64.b64encode(buffered.getvalue()).decode("utf-8")
        return str_encode_file 
    
    def get_fullTextAnnotation(json_data):
        text_dict = json.loads(json_data)
        try:
            text = text_dict["responses"][0]["fullTextAnnotation"]["text"]
            return text
        except:
            print(None)
            return None
        
   


    str_encode_file = pil_image_to_base64(input_image) # input_imageを画像のPATHにしたい時はこっちを選択 
    #str_encode_file = array_to_base64(input_image)# input_imageをarrayにしたい時はこっちを選択
    str_url = "https://vision.googleapis.com/v1/images:annotate?key="
    str_api_key = ""#APIキーをここに入れる
    str_headers = {'Content-Type': 'application/json'}
    str_json_data = {
        'requests': [
            {
                'image': {
                    'content': str_encode_file
                },
                'features': [
                    {
                        'type': "DOCUMENT_TEXT_DETECTION",#ここでtypeを選択
                        'maxResults': 1
                    }
                ]
            }
        ]
    }

    obj_session = Session()
    obj_request = Request("POST",
                            str_url + str_api_key,
                            data=json.dumps(str_json_data),
                            headers=str_headers
                            )
    obj_prepped = obj_session.prepare_request(obj_request)
    obj_response = obj_session.send(obj_prepped,
                                    verify=True,
                                    timeout=60
                                    )

    if obj_response.status_code == 200:
        text = get_fullTextAnnotation(obj_response.text)
        
        return text
    else:
        return "error"

参考

リリースノート
 Feature
Google Cloud Visionを使用して縦書きテキストを認識する

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up