More than 5 years have passed since last update.

pythonで画像収集プログラムを書いてみた　１日目

Last updated at 2019-07-31Posted at 2019-07-28

目標

最終的にはgoogleの画像検索のURLを入力することで自動で画像のサイズを調節してダウンロードしてくれるプログラムを作る。
１日目はURLを入力したらそのページの画像を全て保存するプログラムを書く。

PCスペック

プロセッサ　1.6 GHz Intel Core i5
メモリ　8 GB 2133 MHz LPDDR3

まずBeautifulSoupの使い方を勉強

pythonでのスクレイピングにはBeautifulSoupを用いるのが良いと聞いたので、その使い方について少しだけ説明します。

bs4.BeautifulSoupクラスの作り方

from bs4 import BeautifulSoup
import requests

url="https://sample_url"
# URLからhtmlファイルを取得
html=requests.get(url).text

# htmlファイルからbs4.BeautifulSoupインスタンスを作る
soup=BeautifulSoup(html,"html.parser")

URLからのhtmlの取得にはrequestsモジュールを使用しました。

BeautifulSoupクラスの使い方

今回はページのタグとタグの要素の取得を主に使います。タグの検索の方法は三種類あり


# aタグの一番初めを取得
a_soup=soup.find("a")
# 簡単な書き方
a_soup=soup.a

# aタグを全て取得
a_soup_all=soup.find_all("a")

属性の取得には先ほどのタグに対してgetメソッドを用いる

# aタグのhref要素を取得
href=soup.find("a").get("href")
href=soup.a.get("href")
href=soup.find_all("a")[0].get("href")

完成したコード

"""
入力：url、枚数、保存するディレクトリのパス,ファイルの名前
出力：ファイルへの画像の保存
"""
import requests
from bs4 import BeautifulSoup as bs
from shutil import move
from os import getcwd,path,mkdir

def image_download(url,size,save_path,filename):
# 保存するディレクトリを作成する
    if not path.isdir(path.join(save_path,filename)):
        mkdir(path.join(save_path,filename))

    #URLからHTMLを取得
    res=requests.get(url).text
    #HTMLをBeautifulSoupに変換
    soup=bs(res,"html.parser")
    #ページの中で画像のタグを全て取得
    image_soup=soup.find_all("img")

    for i in range(min(len(image_soup),size)):
        name=filename+str(i+1)+".jpg"
        #画像ファイルのURLを取得
        img_url=image_soup[i].get("src")
        image_page=requests.get(img_url)
        #画像を取得
        image=image_page.content

        #画像を保存したファイルを作り、目的のディレクトリに移動する
        if image_page.status_code==200:
            with open(name,"wb") as f:
                f.write(image)
            move(path.join(in_path,name),path.join(save_path,filename))

感想

少し処理の内容が解りづらいので、もう少し単純にしたいです。次は、画像の整形か、URLではなく検索したいワードなどから画像を収集できるようにしたいです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

pythonで画像収集プログラムを書いてみた １日目

目標