1
1

More than 1 year has passed since last update.

Python: インターネットからHTMLソースを取ってきて<body>の中身だけを抜く(レスポンスがHTMLのURLの場合)

Last updated at Posted at 2022-02-09
import urllib.request
from bs4 import BeautifulSoup

USER_AGENT_DUMMY = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"
HEADERS = {"User-Agent": USER_AGENT_DUMMY}

def request(url: str, headers: Dict[str, any] = HEADERS) -> str:
    req = urllib.request.Request(url, headers=headers)
    with urllib.request.urlopen(req) as res:
        html = res.read()
        soup = BeautifulSoup(html, "html.parser")
        return soup.find("body").decode_contents(formatter="html")
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1