More than 3 years have passed since last update.

beautifulsoupで[特定の親要素]配下の[特定の子要素]だけを取得する方法

Posted at 2022-05-14

やりたいこと

（備忘録を兼ねての内容なのであしからず）

ターゲットとするhtmlから、繰り返される親要素（タグ）のうち、特定の「id」に合致するものを取得。
さらに特定の「class」に該当する子要素（タグ）を取得したい。
※「id」、「class」と記載ありますが、あくまで今回やりたかったケースなので、他の組み合わせも可能

やったこと

beautifulsoupのfindメソッドを連結させることで対応できました。

簡単にいえば
　find（親要素）.find（子要素）
ですね。

from selenium import webdriver
from bs4 import BeautifulSoup
import re

browser = webdriver.Chrome((r'C:\Users\[ユーザ名]\chromedriver.exe')
browser.get([目的のURL])

html = browser.page_source
soup = BeautifulSoup(html,'html.parser')

#子要素の名称は部分一致させたかったのでre.compileで対応しています
elements = soup.find([ターゲットの親タグ],id=[ターゲットのid名]).find_all([ターゲットの子タグ],class_= re.compile([ターゲットのクラス名]))

最後に

beautiful　ってスペル、めっちゃ難しい。
大体、打ち間違えます・・・

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up