LoginSignup
3
5

More than 1 year has passed since last update.

selenium4 で スクレイピング

Last updated at Posted at 2023-02-11

とりあえず、動くコード

ヤフーのソースを表示するだけ。

yum -y install libX11 GConf2 fontconfig
yum -y install ipa-gothic-fonts ipa-mincho-fonts ipa-pgothic-fonts ipa-pmincho-fonts
fc-cache -fv
yum -y install google-chrome-stable libOSMesa
python3 -m pip install selenium
python3 -m pip install webdriver-manager

今回はwebdriver-managerを使用するので、わざわざドライバをダウンロードする必要がない。

chrome

hoge.py
from selenium import webdriver
from selenium.webdriver import ChromeOptions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome import service as fs
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time

options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")


serv = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=serv, options=options)
wait = WebDriverWait(driver, 10)

# URLにアクセス
driver.get("https://yahoo.co.jp")

# htmlを取得・表示
html = driver.page_source
print(html)

# ブラウザーを終了
driver.quit()

no-sandbox が無いとエラーが出る

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

firefox

firefoxをインスコ

aws lightsail だと

sudo amazon-linux-extras install firefox

をして、インスコ

続いてコード

hoge.py
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.service import Service
from webdriver_manager.firefox import GeckoDriverManager
import time


options = webdriver.FirefoxOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")

serv = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=serv, options=options)
wait = WebDriverWait(driver, 10)


# 確認くんにアクセス
driver.get("https://www.ugtop.com/spill.shtml")

# スクショを取得
driver.get_screenshot_as_file('/var/www/html/screenshot3.png')

# ブラウザーを終了
driver.quit()



3
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
5