More than 3 years have passed since last update.

Python seleniumでわからないことを解決する記事

Last updated at 2022-01-26Posted at 2019-03-05

概要

Seleniumを使用していてわからなかったところを、誰かの役に立てばよいということから書きます。
WebScraypingは著作権違法などに気を付けて行ってください（詳しくは@nezuqさんが書かれた記事を参考にしてください。）

Selenium API リファレンス

Seleniumの良記事紹介

すばらしい記事にて導入の仕方と、簡単なチュートリアルが書かれています。勘が良い方はこれだけをみてわかると思いますが、具体的にどうすればよいかわからないことがあったので、補完する意味合いの記事です。

スクレイピング画像保存

requestsを使ってください。Bit単位でファイルに保存してます。

import requests
src = driver.find_element_by_id("A").get_attribute("src")
imagename = src.split("/")[-1]
with open("img/"+imagename, "wb") as f:
    re = requests.get(src)
    f.write(re.content)

Tableの詳細一覧を個別に取得したい

for分を使ってループ処理します。下コードは場所を特定したかったのでenumerateを使ってindexも取得しています。

scroll = driver.find_element_by_id("A")
table = scroll.find_element_by_tag_name("table")
thead = table.find_element_by_tag_name("thead").find_element_by_tag_name("tr").find_elements_by_tag_name("th")
for i, th in enumerate(thead):
    explain += "<th>"+ th.text+ "</th>"
tbody = table.find_element_by_tag_name("tbody").find_element_by_tag_name("tr").find_elements_by_tag_name("th")
for i, th in enumerate(tbody):
    explain += "<th>"+ th.text+ "</th>"

あるID以下のHTMLだけほしい

innerHTMLをgetする

kore = driver.find_element_by_id("kore")
element_html = kore.get_attribute("innerHTML")

ID_Aの下のtagBの中のtagCの中のhrefがほしい

Xpathを使って素直に書くこともできますが、こういう書き方もあるよとご参考になれば

href = driver.find_element_by_id("A").find_element_by_tag_name("B").find_element_by_tag_name("C").get_attribute("href")

指定されたIDがあるかチェックしたい

find_elements_by_idでlistで取得して、サイズを確認して個別処理を走らせる

if(len(driver.find_elements_by_id("A")) > 0):
    a = hogehoge(driver)

ソースをごそっと抽出したい

driverからBeautifulSoupを使います。

from bs4 import BeautifulSoup
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
ab3 = soup.find(id="AB3")

iframeの中を取得したい

素晴らしい記事がありました
スクロールしないとダメなケースがありますので、そのケースの場合は

javascriptを実行してページの最下部へ移動

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

javascriptで行う場合

javascript_ret=driver.execute_script("return data.contentDocument.body.innerHTML;")
soup=BeautifulSoup(javascript_ret,"html.parser")
# 処理

seleniumで頑張る場合

dd=driver.find_element_by_id("detailFrame")
driver.switch_to_frame(dd)

selectboxの中を操作したい

参考になるリンクがあったので、こちらをご参照ください
https://yuki.world/selenium-select/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up