More than 5 years have passed since last update.

BeautifulSoupメモ

Python

Last updated at 2014-06-15Posted at 2014-06-15

ドキュメント
http://www.crummy.com/software/BeautifulSoup/bs4/doc/

BeautifulSoup

from bs4 import BeautifulSoup

soup = BeautifulSoup(raw) #rawはwebページ読み込みデータ

# findAll:該当するタグのオブジェクトをリストで取得
# 下記だとクラスがimage-itemsのulを全取得
ul_items = soup.findAll('ul',class_='image-items')

# find:該当するタグのオブジェクトを1件取得
a = item.find('a')
# id指定だとこんな感じ
sample = soup.find(id='template-embed-sample')

# 属性値の取得
# aタグのリンク先の取得
link = a.attrs['href']

findメソッドで得たBeautifulSoupオブジェクト？は内包している子の情報をもつので
下記のような取得もできる

<div><span>hogehoge</span><div>

hogehogeを取得するのに

div = soup.find('div')
span = div.find('span')#div内のspanを探す
print(span.text)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up