More than 3 years have passed since last update.

【メモ】BeautifulSoup4の使い方(3) class_で記事の見出しを表示

Posted at 2020-10-10

前回はfind_allを使って見出しを表示したが、今回はclass_を使って見出しを表示する。また、Yahoo！Japanをスクレイピングすることにした。

In[1]BeautifulSoupとRequestsをimportする

In[1]

from bs4 import BeautifulSoup
import requests

In[2]RequestsでYahoo！Japanのurlを取得し、テキストを表示する

In[2]

toget_url =requests.get("https://www.yahoo.co.jp/")
toget_url.text

In[3]BeautifulSoupとhtml.parserで解析

In[3]

soup = BeautifulSoup(toget_url.text,"html.parser")

ここまでは変数とurlを変えたこと以外は前回と同じだ。

In[4]class_=をもとにfind_allで検索

In[4]

heading =soup.find_all(class_="TRuzXRRZHRqbqgLUCCco9")

デベロッパーツールでYahoo！Japanの見出しを調べたところ"TRuzXRRZHRqbqgLUCCco9"が見出しで使われていることがわかった。class_で検索するときは_（アンダーバー）を忘れないようにする。

In[5]for文で回して内容を表示

In[5]

for heading_name in heading:
    print(heading_name)

これで見出しが表示できた。