More than 1 year has passed since last update.

HTMLの要素から、cssパスを取得したいとき

Last updated at 2023-04-27Posted at 2023-04-25

スクレイピングをしていて、HTMLにバリエーションが様々あった場合に、HTMLの要素からCSSパスを取得したいことがある。
この場合、下記のようにして、要素からCSSパスを取得することができる。

from bs4 import BeautifulSoup
import cssselect

def get_css_path(element):
    path = []
    while element is not None and element.name is not None:
        siblings = [sibling for sibling in element.find_previous_siblings() if sibling.name == element.name]
        index = len(siblings) + 1
        if index > 1:
            path.insert(0, f'{element.name}:nth-of-type({index})')
        else:
            path.insert(0, element.name)
        element = element.parent
    return ' > '.join(path)

html_doc = '''
<!DOCTYPE html>
<html>
<head>
    <title>Page Title</title>
</head>
<body>
    <div class="container">
        <h1>My Heading</h1>
        <p>My first paragraph.</p>
        <p>My second paragraph.</p>
    </div>
</body>
</html>
'''

soup = BeautifulSoup(html_doc, 'html.parser')
element = soup.select_one('p')  # You can replace this with your desired element
css_path = get_css_path(element)
print(css_path)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up