import urllib.request
from bs4 import BeautifulSoup
USER_AGENT_DUMMY = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"
HEADERS = {"User-Agent": USER_AGENT_DUMMY}
def request(url: str, headers: Dict[str, any] = HEADERS) -> str:
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req) as res:
html = res.read()
soup = BeautifulSoup(html, "html.parser")
return soup.find("body").decode_contents(formatter="html")
More than 1 year has passed since last update.
Python: インターネットからHTMLソースを取ってきて<body>の中身だけを抜く(レスポンスがHTMLのURLの場合)
Last updated at Posted at 2022-02-09
Register as a new user and use Qiita more conveniently
- You get articles that match your needs
- You can efficiently read back useful information
- You can use dark theme