More than 5 years have passed since last update.

requestsを使った画像のダウンロード

Python

Last updated at 2016-09-12Posted at 2016-09-08

あくまで個人用のメモ。
requests というライブラリを使って、さくっと画像をダウンロードするPythonプログラムを作成する。
python3では urllib.requests が便利そうだが、python2では使えなさそう(調査不足)なのでこれを利用した。
Cookieなど色々と設定もできるが、アクセスしてDLするだけの簡単なプログラムを作成する。

公式: python-requests

インストール

$ pip install requests

ためしに使ってみる

$ python
>>> import requests
>>> url = "http://docs.python-requests.org/en/master/#"
>>> res = requests(url)
>>> res = requests.get(url)
>>> res.status_code
200
>>> res.headers["content-type"]
'text/html'
>>> res.content
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n  <head>\n...
>>> res.text  
u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n  <head>\n ...

使い方 (抜粋)

詳しくは The User Guide を参照してください。

1. requestの送り方

URLパラメータを設定したときは、引数paramsに辞書形式で与える。

res = requests.get('http://httpbin.org/get', params={'key':'value'})

print(res.url)  #=> http://httpbin.org/get?key=value

post, putでは引数dataでフォーム情報を流すことができる。

res = requests.post('http://httpbin.org/post', data = {'key':'value'})
res = requests.put('http://httpbin.org/put', data = {'key':'value'})

requestの種類に応じたメソッドが用意されている。

res = requests.get('http://httpbin.org/get')
res = requests.post('http://httpbin.org/post', data = {'key':'value'})
res = requests.put('http://httpbin.org/put', data = {'key':'value'})
res = requests.delete('http://httpbin.org/delete')
res = requests.head('http://httpbin.org/get')
res = requests.options('http://httpbin.org/get')

2. レスポンスの処理

以下の変数を参照すればよい。

res = requests.get('http://httpbin.org/get')

# HTML Status Code
response.status_code

# レスポンスのヘッダのContent-Typeを調べる
print res.header["content-type"] 

# 取得したデータ(バイナリ)
print res.content

# 取得したデータ(エンコード済み)とエンコーディング
print res.text
print res.encoding

実際に画像をダウンロードしてみよう

入力はURLが羅列されたテキストファイル input.txt で、出力用ディレクトリ images/ に0.jpg, 1.jpg, 2.jpg, ...の順に画像を出力する。
ところどころ、変てこなコードが混じっているのはご愛嬌ってことで。

import requests
import os
import sys

# 画像をダウンロードする
def download_image(url, timeout = 10):
    response = requests.get(url, allow_redirects=False, timeout=timeout)
    if response.status_code != 200:
        e = Exception("HTTP status: " + response.status_code)
        raise e

    content_type = response.headers["content-type"]
    if 'image' not in content_type:
        e = Exception("Content-Type: " + content_type)
        raise e

    return response.content

# 画像のファイル名を決める
def make_filename(base_dir, number, url):
    ext = os.path.splitext(url)[1] # 拡張子を取得
    filename = number + ext        # 番号に拡張子をつけてファイル名にする

    fullpath = os.path.join(base_dir, filename)
    return fullpath

# 画像を保存する
def save_image(filename, image):
    with open(filename, "wb") as fout:
        fout.write(image)

# メイン
if __name__ == "__main__":
    urls_txt = "input.txt"
    images_dir = "images"
    idx = 0

    with open(urls_txt, "r") as fin:
        for line in fin:
            url = line.strip()
            filename = make_filename(images_dir, idx, url)

            print "%s" % (url)
            try:
                image = download_image(url)
                save_image(filename, image)
                idx += 1
            except KeyboardInterrupt:
                break
            except Exception as err:
                print "%s" % (err)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up