More than 5 years have passed since last update.

pythonで無限ローディングするページのソースを取得する。

Last updated at 2020-02-09Posted at 2020-02-09

概要

この記事で、無限ローディングするページを作成しました。このようなページではよくやる使い方のcurlやrequestsではソース取得できませんので少々特殊なことをする必要があります。
今回はこのページの内容を取得するコードを記します。

環境

*python 3.8.1

コード

この記事のコードにより作成されるページに対し、ソースの取得を行います。時間及び取得済みバイト数の制限に達すると、それまでに取得したコードを出力します。

get_inf_page.py

import requests
import timeout_decorator

r_bytes = b""
def main():
    url = "http://localhost:8000"

    r = requests.get(url, stream=True, timeout=20)

    byte_limit = 30
    @timeout_decorator.timeout(100)
    def load_bytes(r):
        global r_bytes
        for l in r.iter_content():
            r_bytes += l
            if len(r_bytes) % 500 == 0:
                print(f"loaded:{len(r_bytes)}/{byte_limit}")
            if len(r_bytes) > byte_limit:
                r.close()
                print("reached size limit")
                break

    try:
        load_bytes(r)
    except timeout_decorator.timeout_decorator.TimeoutError:
        print("timeout")
        pass

    print(r_bytes)

if __name__ == "__main__":
    main()

動作確認 (ロード済みバイト数オーバーでストップ)

この記事のコードを別ターミナルで動かしている最中に上記のコードを動かしてください。以下のように表示されます。

reached size limit
b'<p>Hello World ! 0</p><p>Hello '

動作確認 (ロード時間オーバーでストップ)

11,12行目を以下用に変更させて上記と同様に動作確認してください。

    byte_limit = 1000
    @timeout_decorator.timeout(5)

起動後5秒の内に出力された分だけ表示されます。

timeout
b'<p>Hello World ! 0</p><p>Hello World ! 1</p><p>Hello World ! 2</p>'

以上。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up