More than 5 years have passed since last update.

google画像からすぐにスクレイピング！

Last updated at 2020-02-08Posted at 2019-12-08

はじめに

画像検索からいちいち保存やらなんやらの画像収集がめんどくさいので、探してみたらいいドライバがあったので自己満でまとめてみる。

準備

__google_images_dwonload__をインストール
ページ⇒

$pip install google_images_download
$pip install os
$pip install glob
$pip install chromedriver

が必要となるのでインストールしておく。
chromedriverのなかに、アプリケーションの"chromedriver"がなければ、
https://chromedriver.chromium.org/downloads
よりダウンロードして、chromedriverのファイルの中に保管。

以上で、大体の準備完了

環境

・pycharm
・python 3.7.4
・windows10

ソースコード

import文

from google_images_download import google_images_download
import glob
import os

中身
今回は、検索ワードを"ONE OK ROCK"とする。
limit = 100　とし、画像の枚数を100枚とする。

config = {
    "Records": [
        {
            "keywords": "ONE OK ROCK LIVE",
            "no_numbering": True,
            "limit": 100,
            "output_directory": "images",
            "image_directory": "ONE OK ROCK",
            "chromedriver": "C\\[path to chromedirver]\chromedriver\chromedriver.exe",

        }
    ]
}

chromedriverまでのpathをしっかりと書くこと。

response = google_images_download.googleimagesdownload()
for rc in config["Records"]:
    response.download(rc)

一応、gif画像は抜けにする。（めんどくさいので）

gifImgs = glob.glob("images" + os.sep + "*" + os.sep + "*.gif")
print(f"removing gif files: {len(gifImgs)} files")
_ = [os.remove(f) for f in gifImgs]

上記のものを順におこなうと、取得できる。

結果

さいごに

このスクレイピングは情報収集にはいいものとなっていて、非常に便利である。
ページからの画像取得とはだいぶ異なるのでそれについては別記事で書こうと思う。
簡単にしか書いてないので、くわしいことは調べてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up