LoginSignup
34
30

More than 5 years have passed since last update.

HTMLソースの特定の要素を削除・置換 【BeautifulSoup】

Posted at

HTMLのスクレイピング処理で、特定の条件に合う要素を削除したり置換する方法

(*例えば、リンクをすべてスキップしたい、図表は飛ばす、など)

Python BeautifulSoupで、.extract(), .replace_with() 関数を使う。

from bs4 import BeautifulSoup

txt = """<p>I have a dog.  His name is <span class="secret">Ken</span>.</p>"""
soup = BeautifulSoup(txt)

# This keeps "unwanted" information
soup.get_text()
#: u'I have a dog.  His name is Ken.'


# remove an element by tag matching 
soup.find("span", {"class":"secret"}).extract()
soup.get_text()
#: u'I have a dog.  His name is .'


# or you can replace that with something
soup = BeautifulSoup(txt)
soup.find("span", {"class":"secret"}).replace_with("confidential")
soup.get_text()
#: u'I have a dog.  His name is confidential.'
34
30
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
34
30