22
Help us understand the problem. What are the problem?

More than 1 year has passed since last update.

posted at

updated at

Organization

【備忘録】Python requests モジュール

環境

Request

dataパラメタにstringを渡す場合はencodeする

# OK
requests.post("https://httpbin.org/post", data="a")         

# NG
requests.post("https://httpbin.org/post", data="あ")       
#UnicodeEncodeError: 'latin-1' codec can't encode character '\u3042' in position 0: Body ('あ') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

# OK
requests.post("https://httpbin.org/post", data="あ".encode("utf-8"))                      

data – (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the Request.

Response

その他

同一のホストにリクエストを投げる場合は、sessionオブジェクトを利用する

So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

同じホストに10回リクエストを投げたときで比較すると、確かにsessionオブジェクトを使った方がかかった時間が短いです。

IPython
In [52]: %time for i in range(10):requests.get("https://httpbin.org/get")                                                                                                                                          
CPU times: user 187 ms, sys: 6.27 ms, total: 193 ms
Wall time: 6.94 s

In [53]: s = requests.Session()                                                                                                                                                                                    
In [54]: %time for i in range(10):s.get("https://httpbin.org/get")                                                                                                                                                 
CPU times: user 58.5 ms, sys: 67 µs, total: 58.6 ms
Wall time: 2.27 s

logging

http://docs.python-requests.org/en/master/api/#api-changes 参考

urllib3のロガーを有効にする

import logging
# 適切なloggerに設定する
requests_log = logging.getLogger("urllib3")
requests_log.setLevel(logging.DEBUG)
IPython
In [37]: r = requests.post("https://httpbin.org/post", data="a", params={"x":1})                                                                                                                                   
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "POST /post?x=1 HTTP/1.1" 200 236

http.clientのデバッグを有効にする

import http
http.client.HTTPConnection.debuglevel=1
In [37]: r = requests.post("https://httpbin.org/post", data="a", params={"x":1})                                                                                                                                   
send: b'POST /post?x=1 HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 1\r\n\r\n'
send: b'a'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin header: Content-Encoding header: Content-Type header: Date header: Server header: Content-Length header: Connection 

リトライ方法

requestsモジュールにはリトライの仕組みがないので、リトライモジュールと併用するのが良い。
たとえばbackoffモジュール(https://pypi.org/project/backoff/)など。

# HTTP Status Codeが429 or 5XXのときはリトライする. 最大5分間リトライする。

import backoff

def fatal_code(e):
    """Too many Requests(429)のときはリトライする。それ以外の4XXはretryしない"""
    if e.response is None:
        return True
    code = e.response.status_code
    return 400 <= code < 500 and code != 429


@backoff.on_exception(backoff.expo, requests.exceptions.RequestException,
                                    jitter=backoff.full_jitter,
                                    max_time=300,
                                    giveup=fatal_code)
def get_response_text(url):
    r = requests.get(url)
    r.raise_for_status()
    return r.text

backoffモジュールの設定を共通化したい場合は、backoffデコレータをラップしたデコレータを作成する。

def my_backoff(function):
    @functools.wraps(function)
    def wrapped(*args, **kwargs):
        def fatal_code(e):
            """Too many Requests(429)のときはリトライする。それ以外の4XXはretryしない"""
            if e.response is None:
                return True
            code = e.response.status_code
            return 400 <= code < 500 and code != 429

        return backoff.on_exception(backoff.expo, requests.exceptions.RequestException,
                                    jitter=backoff.full_jitter,
                                    max_time=300,
                                    giveup=fatal_code)(function)(*args, **kwargs)

    return wrapped

@my_backoff
def get_response_text(url):
    pass

backoffアルゴリズムについては以下を参照。
https://codezine.jp/article/detail/10739
https://aws.typepad.com/sajp/2015/03/backoff.html

動作確認方法

以下のサイトにアクセスするのがよい。

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Sign upLogin
22
Help us understand the problem. What are the problem?