はじめて Google Cloud Vision API を触ってみた

Last updated at 2020-02-08Posted at 2020-02-07

画像処理をやっていながら、Vison API使ったことなかったんですよね。
すげえなとか思いつつ何だかんだやらずじまいでして..

とりあえず軽めでも試しに触ってみようと思い、Pythonで書いてみました！
本記事はその時のメモ用に残したものです〜

因みに、登録手順等はこの辺り参考にしながらやってます〜

Vision API クライアントライブラリ

使った機能

今回使ったのは以下の機能です。
公式から説明持ってきています。

オブジェクトの自動検出
Cloud Vision API では、オブジェクトローカライズを使用して、画像内の複数のオブジェクトを検出して抽出できます。
オブジェクトローカライズにより、画像内のオブジェクトが識別され、オブジェクトごとに LocalizedObjectAnnotation が指定されます。各 LocalizedObjectAnnotation によって、オブジェクトに関する情報、オブジェクトの位置、画像内でオブジェクトがある領域の枠線が識別されます。
オブジェクトローカライズでは、画像内で目立っているオブジェクトとそれほど目立たないオブジェクトの両方が識別されます。

ソースコード

雑ですがお許しください...
認識した始点座標と終点座標も欲しかったので、荒技で引き出してます。
jsonキーの確認の仕方これで合ってるのか？って感じですが。


ENDPOINT_URL = 'https://vision.googleapis.com/v1/images:annotate'
API_KEY = 'APIキー'

# jsonキーワード
RESPONSES_KEY = 'responses'
LOCALIZED_KEY = 'localizedObjectAnnotations'
BOUNDING_KEY = 'boundingPoly'
NORMALIZED_KEY = 'normalizedVertices'
NAME_KEY = 'name'
X_KEY = 'x'
Y_KEY = 'y'
def get_gcp_info(image):

    image_height, image_width, _ = image.shape
    min_image = image_proc.exc_resize(int(image_width/2), int(image_height/2), image)

    _, enc_image = cv2.imencode(".png", min_image)
    image_str = enc_image.tostring()
    image_byte = base64.b64encode(image_str).decode("utf-8")

    img_requests = [{
        'image': {'content': image_byte},
        'features': [{
            'type': 'OBJECT_LOCALIZATION',
            'maxResults': 5
        }]
    }]

    response = requests.post(ENDPOINT_URL,
                             data=json.dumps({"requests": img_requests}).encode(),
                             params={'key': API_KEY},
                             headers={'Content-Type': 'application/json'})

    # 'responses'キーが存在する場合
    if RESPONSES_KEY in response.json():
        # 'localizedObjectAnnotations'キーが存在する場合
        if LOCALIZED_KEY in response.json()[RESPONSES_KEY][0]:
            # 'boundingPoly'キーが存在する場合
            if BOUNDING_KEY in response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0]:
                # 'normalizedVertices'キーが存在する場合
                if NORMALIZED_KEY in response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][BOUNDING_KEY]:

                    name = response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][NAME_KEY]

                    start_point, end_point = check_recognition_point(
                        response.json()[RESPONSES_KEY][0][LOCALIZED_KEY][0][BOUNDING_KEY][NORMALIZED_KEY],
                        image_height,
                        image_width
                    )

                    print(name, start_point, end_point)

                    return True, name, start_point, end_point

    print("non", [0, 0], [0, 0])
    # 情報が足りない場合
    return False, "non", [0, 0], [0, 0]

def check_recognition_point(point_list_json, image_height, image_width):
    # 認識座標のX始点（％）
    x_start_rate = point_list_json[0][X_KEY]
    # 認識座標のY始点（％）
    y_start_rate = point_list_json[0][Y_KEY]
    # 認識座標のX終点（％）
    x_end_rate = point_list_json[2][X_KEY]
    # 認識座標のY終点（％）
    y_end_rate = point_list_json[2][Y_KEY]

    x_start_point = int(image_width * x_start_rate)
    y_start_point = int(image_height * y_start_rate)
    x_end_point = int(image_width * x_end_rate)
    y_end_point = int(image_height * y_end_rate)

    return [x_start_point, y_start_point], [x_end_point, y_end_point]

nameに認識されたオブジェクト名、start_point, end_pointには認識したオブジェクトの座標が返されます。

終わりに

服とか靴とか通してみたのですが、ちゃんと認識できてました！（nameは結構大雑把でしたが）
自分でAUTOML使ってモデルとか作ったら面白そうですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up