More than 5 years have passed since last update.

Object recognition by using the Visual Recognition in the IBM Watson Developer Cloud

Bluemix

Last updated at 2015-04-20Posted at 2015-04-16

Object recognition is to detect particular objects in visual images.

This time, I am going to introduce how to perform object recognition by using Visual Recognition, which is one of the functions of the IBM Watson Developer Cloud.

※ You must be registered in IBM Bluemix in advance.

How to get your user name and password for APIs

You have to obtain a user name and a password to use Visual Recognition.

Go to the management page of IBM Bluemix and create an application first. Now you can add Visual Recognition to its application.

Then when you click "display of credentials" of the service, your user name and password should be listed.

How to obtain labels

This program returns a pair of labels and scores as a result of object recognition, so you have to obtain labels used for running the application.

The script is shown as follows.


# ! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Obtain the labels provided by Visual Recognition service of IBM Watson Developer Cloud.
"""
import sys
import json
import requests
from pit import Pit

setting = Pit.get('iwdcat',
                  {'require': {'username': '',
                               'password': '',
                               }})

auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/labels'
res = requests.get(url, auth=auth_token, headers={'content-type': 'application/json'})
if res.status_code == requests.codes.ok:
    labels = json.loads(res.text)
    print('label groups({}): {}'.format(len(labels['label_groups']), labels['label_groups']))
    print()
    print('labels({}): {}'.format(len(labels['labels']), labels['labels']))
else:  # error
    print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
    sys.exit(1)

The results are returned in JSON format. "label_groups" is the list of label groups, "labels" is the list of labels.

Visual image analysis

You have to import image files in multi-part to Visual Recognition API.
It seems formats of visual images can be png, jpg or even zip file. The following example shows how to import a single png image.

The image format is ping, jpg and may be zip compressed file.
The following is an example of sending a single png image.


# ! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Analyze the image
"""
import os
import sys
import json
import requests
from pit import Pit

setting = Pit.get('iwdcat',
                  {'require': {'username': '',
                               'password': '',
                               }})

auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'

filepath = 'var/images/first/2015-04-12-11.47.01.png'  # path to image file
filename = os.path.basename(filepath)

res = requests.post(
    url, auth=auth_token,
    files={
        'imgFile': (filename, open(filepath, 'rb')),
        }
    )
if res.status_code == requests.codes.ok:
    data = json.loads(res.text)
    for img in data['images']:
        print('{} - {}'.format(img['image_id'], img['image_name']))
        for label in img['labels']:
            print('    {:30}: {}'.format(label['label_name'], label['label_score']))

else:  # error
    print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
    sys.exit(1)

After analyzing the image file, the result was shown as follows.


$ python analyze_image.py
0 - 2015-04-12-11.47.01.png
    Outdoors                      : 0.714211
    Nature Scene                  : 0.671271
    Winter Scene                  : 0.669832
    Vertebrate                    : 0.635903
    Boat                          : 0.61398
    Animal                        : 0.610709
    Water Vehicle                 : 0.607173
    Placental Mammal              : 0.580503
    Snow Scene                    : 0.571422
    Fabric                        : 0.563129
    Gray                          : 0.56078
    Water Sport                   : 0.555034
    Person                        : 0.533461
    Mammal                        : 0.515725
    Surface Water Sport           : 0.511447

The returned actual data is shown as below.


{'images': [{'image_id': '0', 'labels': [{'label_score': '0.714211', 'label_name': 'Outdoors'}, {'label_score': '0.671271', 'label_name': 'Nature Scene'}, {'label_score': '0.669832', 'label_name': 'Winter Scene'}, {'label_score': '0.635903', 'label_name': 'Vertebrate'}, {'label_score': '0.61398', 'label_name': 'Boat'}, {'label_score': '0.610709', 'label_name': 'Animal'}, {'label_score': '0.607173', 'label_name': 'Water Vehicle'}, {'label_score': '0.580503', 'label_name': 'Placental Mammal'}, {'label_score': '0.571422', 'label_name': 'Snow Scene'}, {'label_score': '0.563129', 'label_name': 'Fabric'}, {'label_score': '0.56078', 'label_name': 'Gray'}, {'label_score': '0.555034', 'label_name': 'Water Sport'}, {'label_score': '0.533461', 'label_name': 'Person'}, {'label_score': '0.515725', 'label_name': 'Mammal'}, {'label_score': '0.511447', 'label_name': 'Surface Water Sport'}], 'image_name': '2015-04-12-11.47.01.png'}]}

Bulk analysis

It is also possible to analyze multiple files at one time by importing them in multi-part.

# ! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Bulk analysis
"""
import os
import sys
import json
import requests
from pit import Pit

setting = Pit.get('iwdcat',
                  {'require': {'username': '',
                               'password': '',
                               }})

auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'

filepaths = [
    'var/images/first/2015-04-12-11.47.01.png',
    'var/images/first/2015-04-12-11.44.42.png',
    'var/images/first/2015-04-12-11.46.11.png',
    ]
files = dict((os.path.basename(filepath), (os.path.basename(filepath), open(filepath, 'rb'))) for filepath in filepaths)

res = requests.post(
    url, auth=auth_token,
    files=files,
    )

for key, (filename, fp) in files.items():
    fp.close()

if res.status_code == requests.codes.ok:
    data = json.loads(res.text)
    for img in data['images']:
        print('{} - {}'.format(img['image_id'], img['image_name']))
        for label in img['labels']:
            print('    {:30}: {}'.format(label['label_name'], label['label_score']))

else:  # error
    print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
    sys.exit(1)

The returned data shows the elements of "images" key in a list and each of analysis results is seen in the order. The execution results are as follows.


$ python analyze_image_multi.py
0 - 2015-04-12-11.44.42.png
    Gray                          : 0.735805
    Winter Scene                  : 0.7123
    Nature Scene                  : 0.674336
    Water Scene                   : 0.668881
    Outdoors                      : 0.658805
    Natural Activity              : 0.643865
    Vertebrate                    : 0.603751
    Climbing                      : 0.566247
    Animal                        : 0.537788
    Mammal                        : 0.518001
1 - 2015-04-12-11.46.11.png
    Gray                          : 0.719819
    Vertebrate                    : 0.692607
    Animal                        : 0.690942
    Winter Scene                  : 0.683918
    Mammal                        : 0.669149
    Snow Scene                    : 0.664266
    Placental Mammal              : 0.663866
    Outdoors                      : 0.66335
    Nature Scene                  : 0.656991
    Climbing                      : 0.645557
    Person                        : 0.557965
    Person View                   : 0.528335
2 - 2015-04-12-11.47.01.png
    Outdoors                      : 0.714211
    Nature Scene                  : 0.671271
    Winter Scene                  : 0.669832
    Vertebrate                    : 0.635903
    Boat                          : 0.61398
    Animal                        : 0.610709
    Water Vehicle                 : 0.607173
    Placental Mammal              : 0.580503
    Snow Scene                    : 0.571422
    Fabric                        : 0.563129
    Gray                          : 0.56078
    Water Sport                   : 0.555034
    Person                        : 0.533461
    Mammal                        : 0.515725
    Surface Water Sport           : 0.511447

I was able to analyze 30 files on one request.
Wonder if we can do more?

Sample scripts

Cut the image

recognize objects and save results in JSON format

Convert csv to json format results

Note

In the above examples, I used "pit", a third-party package of Python. You can obtain the current user name and password from the configuration file by "pit".
Now 2015/04/12, "pit", however, is not compatible with Python3, so if you do "pip install pit" in Python3, that is going to cause an error.
Please go to the following links to use "pit" I already customized to be compatible with Python3,

https://github.com/TakesxiSximada/pit/archive/fix/sximada/py3k.zip
https://github.com/TakesxiSximada/pit/tree/fix/sximada/py3k

Thx. :)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up