Object recognition is to detect particular objects in visual images.
This time, I am going to introduce how to perform object recognition by using Visual Recognition, which is one of the functions of the IBM Watson Developer Cloud.
※ You must be registered in IBM Bluemix in advance.
How to get your user name and password for APIs
You have to obtain a user name and a password to use Visual Recognition.
Go to the management page of IBM Bluemix and create an application first. Now you can add Visual Recognition to its application.
Then when you click "display of credentials" of the service, your user name and password should be listed.
How to obtain labels
This program returns a pair of labels and scores as a result of object recognition, so you have to obtain labels used for running the application.
The script is shown as follows.
# ! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Obtain the labels provided by Visual Recognition service of IBM Watson Developer Cloud.
"""
import sys
import json
import requests
from pit import Pit
setting = Pit.get('iwdcat',
{'require': {'username': '',
'password': '',
}})
auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/labels'
res = requests.get(url, auth=auth_token, headers={'content-type': 'application/json'})
if res.status_code == requests.codes.ok:
labels = json.loads(res.text)
print('label groups({}): {}'.format(len(labels['label_groups']), labels['label_groups']))
print()
print('labels({}): {}'.format(len(labels['labels']), labels['labels']))
else: # error
print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
sys.exit(1)
The results are returned in JSON format. "label_groups" is the list of label groups, "labels" is the list of labels.
Visual image analysis
You have to import image files in multi-part to Visual Recognition API.
It seems formats of visual images can be png, jpg or even zip file. The following example shows how to import a single png image.
The image format is ping, jpg and may be zip compressed file.
The following is an example of sending a single png image.
# ! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Analyze the image
"""
import os
import sys
import json
import requests
from pit import Pit
setting = Pit.get('iwdcat',
{'require': {'username': '',
'password': '',
}})
auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'
filepath = 'var/images/first/2015-04-12-11.47.01.png' # path to image file
filename = os.path.basename(filepath)
res = requests.post(
url, auth=auth_token,
files={
'imgFile': (filename, open(filepath, 'rb')),
}
)
if res.status_code == requests.codes.ok:
data = json.loads(res.text)
for img in data['images']:
print('{} - {}'.format(img['image_id'], img['image_name']))
for label in img['labels']:
print(' {:30}: {}'.format(label['label_name'], label['label_score']))
else: # error
print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
sys.exit(1)
After analyzing the image file, the result was shown as follows.
$ python analyze_image.py
0 - 2015-04-12-11.47.01.png
Outdoors : 0.714211
Nature Scene : 0.671271
Winter Scene : 0.669832
Vertebrate : 0.635903
Boat : 0.61398
Animal : 0.610709
Water Vehicle : 0.607173
Placental Mammal : 0.580503
Snow Scene : 0.571422
Fabric : 0.563129
Gray : 0.56078
Water Sport : 0.555034
Person : 0.533461
Mammal : 0.515725
Surface Water Sport : 0.511447
The returned actual data is shown as below.
{'images': [{'image_id': '0', 'labels': [{'label_score': '0.714211', 'label_name': 'Outdoors'}, {'label_score': '0.671271', 'label_name': 'Nature Scene'}, {'label_score': '0.669832', 'label_name': 'Winter Scene'}, {'label_score': '0.635903', 'label_name': 'Vertebrate'}, {'label_score': '0.61398', 'label_name': 'Boat'}, {'label_score': '0.610709', 'label_name': 'Animal'}, {'label_score': '0.607173', 'label_name': 'Water Vehicle'}, {'label_score': '0.580503', 'label_name': 'Placental Mammal'}, {'label_score': '0.571422', 'label_name': 'Snow Scene'}, {'label_score': '0.563129', 'label_name': 'Fabric'}, {'label_score': '0.56078', 'label_name': 'Gray'}, {'label_score': '0.555034', 'label_name': 'Water Sport'}, {'label_score': '0.533461', 'label_name': 'Person'}, {'label_score': '0.515725', 'label_name': 'Mammal'}, {'label_score': '0.511447', 'label_name': 'Surface Water Sport'}], 'image_name': '2015-04-12-11.47.01.png'}]}
Bulk analysis
It is also possible to analyze multiple files at one time by importing them in multi-part.
# ! /usr/bin/env python
# -*- coding: utf-8 -*-
"""Bulk analysis
"""
import os
import sys
import json
import requests
from pit import Pit
setting = Pit.get('iwdcat',
{'require': {'username': '',
'password': '',
}})
auth_token = setting['username'], setting['password']
url = 'https://gateway.watsonplatform.net/visual-recognition-beta/api/v1/tag/recognize'
filepaths = [
'var/images/first/2015-04-12-11.47.01.png',
'var/images/first/2015-04-12-11.44.42.png',
'var/images/first/2015-04-12-11.46.11.png',
]
files = dict((os.path.basename(filepath), (os.path.basename(filepath), open(filepath, 'rb'))) for filepath in filepaths)
res = requests.post(
url, auth=auth_token,
files=files,
)
for key, (filename, fp) in files.items():
fp.close()
if res.status_code == requests.codes.ok:
data = json.loads(res.text)
for img in data['images']:
print('{} - {}'.format(img['image_id'], img['image_name']))
for label in img['labels']:
print(' {:30}: {}'.format(label['label_name'], label['label_score']))
else: # error
print('stauts_code: {} (reason: {})'.format(res.status_code, res.reason))
sys.exit(1)
The returned data shows the elements of "images" key in a list and each of analysis results is seen in the order. The execution results are as follows.
$ python analyze_image_multi.py
0 - 2015-04-12-11.44.42.png
Gray : 0.735805
Winter Scene : 0.7123
Nature Scene : 0.674336
Water Scene : 0.668881
Outdoors : 0.658805
Natural Activity : 0.643865
Vertebrate : 0.603751
Climbing : 0.566247
Animal : 0.537788
Mammal : 0.518001
1 - 2015-04-12-11.46.11.png
Gray : 0.719819
Vertebrate : 0.692607
Animal : 0.690942
Winter Scene : 0.683918
Mammal : 0.669149
Snow Scene : 0.664266
Placental Mammal : 0.663866
Outdoors : 0.66335
Nature Scene : 0.656991
Climbing : 0.645557
Person : 0.557965
Person View : 0.528335
2 - 2015-04-12-11.47.01.png
Outdoors : 0.714211
Nature Scene : 0.671271
Winter Scene : 0.669832
Vertebrate : 0.635903
Boat : 0.61398
Animal : 0.610709
Water Vehicle : 0.607173
Placental Mammal : 0.580503
Snow Scene : 0.571422
Fabric : 0.563129
Gray : 0.56078
Water Sport : 0.555034
Person : 0.533461
Mammal : 0.515725
Surface Water Sport : 0.511447
I was able to analyze 30 files on one request.
Wonder if we can do more?
Sample scripts
Cut the image
recognize objects and save results in JSON format
Convert csv to json format results
Note
In the above examples, I used "pit", a third-party package of Python. You can obtain the current user name and password from the configuration file by "pit".
Now 2015/04/12, "pit", however, is not compatible with Python3, so if you do "pip install pit" in Python3, that is going to cause an error.
Please go to the following links to use "pit" I already customized to be compatible with Python3,
https://github.com/TakesxiSximada/pit/archive/fix/sximada/py3k.zip
https://github.com/TakesxiSximada/pit/tree/fix/sximada/py3k
Thx. :)