More than 3 years have passed since last update.

Python: インターネットからHTMLソースを取ってきて<body>の中身だけを抜く（レスポンスがHTMLのURLの場合）

Last updated at 2022-02-09Posted at 2022-02-09

import urllib.request
from bs4 import BeautifulSoup

USER_AGENT_DUMMY = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"
HEADERS = {"User-Agent": USER_AGENT_DUMMY}

def request(url: str, headers: Dict[str, any] = HEADERS) -> str:
    req = urllib.request.Request(url, headers=headers)
    with urllib.request.urlopen(req) as res:
        html = res.read()
        soup = BeautifulSoup(html, "html.parser")
        return soup.find("body").decode_contents(formatter="html")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up