More than 1 year has passed since last update.

pythonのbeautifulsoup4で、XPATHからhtml要素を取得する

Last updated at 2024-10-16Posted at 2023-07-22

結論

lxmlモジュールを使用します。beautifulsoup4でパースしたデータを一度lxml.html.HtmlElementに変換することでXPATHからhtml要素を取得することができます。

前提

beautifulsoup4だけでなく、lxmlもインストールする必要があります。pipなどでインストールします。

pip install lxml

サンプルコード

サンプルコード作成のために、下記のページを使用させていただきました。

上記のページの各観光地の総合評価を取得しています。

.py

from lxml import html
import requests
from bs4 import BeautifulSoup

def main():
    response = requests.get("https://scraping-for-beginner.herokuapp.com/ranking/")
    soup = BeautifulSoup(response.content, "html.parser")
    # 一度'lxml.html.HtmlElement'に変換する
    lxml_data = html.fromstring(str(soup))
    # XPathが指定できる
    u_ranks  = lxml_data.xpath("//div[contains(@class, 'u_rankBox')]/span[contains(@class, 'evaluateNumber')]")
    
    for u_rank in u_ranks:
        print(u_rank.text)

if __name__ == '__main__':
    main()

結果

4.7
4.7
4.6
4.5
4.5
4.4
4.3
4.3
4.2
4.1

以上です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up