0
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

HTMLの要素から、cssパスを取得したいとき

Last updated at Posted at 2023-04-25

スクレイピングをしていて、HTMLにバリエーションが様々あった場合に、HTMLの要素からCSSパスを取得したいことがある。
この場合、下記のようにして、要素からCSSパスを取得することができる。

from bs4 import BeautifulSoup
import cssselect

def get_css_path(element):
    path = []
    while element is not None and element.name is not None:
        siblings = [sibling for sibling in element.find_previous_siblings() if sibling.name == element.name]
        index = len(siblings) + 1
        if index > 1:
            path.insert(0, f'{element.name}:nth-of-type({index})')
        else:
            path.insert(0, element.name)
        element = element.parent
    return ' > '.join(path)

html_doc = '''
<!DOCTYPE html>
<html>
<head>
    <title>Page Title</title>
</head>
<body>
    <div class="container">
        <h1>My Heading</h1>
        <p>My first paragraph.</p>
        <p>My second paragraph.</p>
    </div>
</body>
</html>
'''

soup = BeautifulSoup(html_doc, 'html.parser')
element = soup.select_one('p')  # You can replace this with your desired element
css_path = get_css_path(element)
print(css_path)
0
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?