More than 5 years have passed since last update.

BingWebSearchAPIを利用して、深層学習用の画像データを収集する方法

Last updated at 2018-06-20Posted at 2018-06-17

はじめに

TensorFlowで画像分類器を作成するために、ネットから画像を収集するためのプログラムを作成しました。近年はウェブスクレイピングが難しくなりつつあるので、BingWebSearchAPIを利用しました。

経緯

最初は、Python3のbeautifulsoupを使って実装しようとしていましたが、挫折しました。画像スクレイピングのハードルは、まだまだ高いと思います。
近年の大手米ITベンダーのAPIはとても使いやすくとても優れているので、使わない手はないです。

Azureのアカウント作成

多少めんどくさいので、ここでは割愛します。

実際のコード

コンフィグファイルとコードは以下の通りです。

config.ini


  1 [keys]
  2 key1 : APIキーその１
  3 key2 : APIキーその２
  4 
  5 [search]
  6 url  : https://api.cognitive.microsoft.com/bing/v7.0/images/search
  7 term : 検索クエリのワード
  8 
  9 [params]
 10 count  : 100
 11 offset : 100
 12 itr    :   5
 13 
 14 [dir]
 15 name : ディレクトリ名
 16 
 17 [file]
 18 name : 連番ファイル名

image_collection.py

  1 import requests
  2 import configparser
  3 
  4 # Read the Configurations
  5 config = configparser.ConfigParser()
  6 config.read('Config.ini')
  7 
  8 # Push the Configurations
  9 headers = {"Ocp-Apim-Subscription-Key" : config['keys']['key1']}
 10 
 11 # Search
 12 results = []
 13 for i in range(int(config['params']['itr'])):
 14     params  = {"q": config['search']['term'], "count": config['params']['count'], "offset": i*int(config['params']['offset'])}
 15     response = requests.get(config['search']['url'], headers=headers, params=params)
 16     response.raise_for_status()
 17     search_results = response.json()
 18     thumbnail_urls = [img["thumbnailUrl"] for img in search_results["value"]]
 19     print(len(thumbnail_urls))
 20     results.extend(thumbnail_urls)
 21 
 22 print(len(results))
 23 
 24 from PIL import Image
 25 from io import BytesIO
 26 
 27 for i in range(len(results)):
 28     image_data = requests.get(results[i])
 29     image_data.raise_for_status()
 30     image = Image.open(BytesIO(image_data.content))
 31     filename = config['dir']['name']+"/"+config['file']['name']+str(i+1)+".jpg"
 32     image.save(filename)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up