とりあえず、動くコード
ヤフーのソースを表示するだけ。
yum -y install libX11 GConf2 fontconfig
yum -y install ipa-gothic-fonts ipa-mincho-fonts ipa-pgothic-fonts ipa-pmincho-fonts
fc-cache -fv
yum -y install google-chrome-stable libOSMesa
python3 -m pip install selenium
python3 -m pip install webdriver-manager
今回はwebdriver-managerを使用するので、わざわざドライバをダウンロードする必要がない。
chrome
hoge.py
from selenium import webdriver
from selenium.webdriver import ChromeOptions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome import service as fs
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
serv = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=serv, options=options)
wait = WebDriverWait(driver, 10)
# URLにアクセス
driver.get("https://yahoo.co.jp")
# htmlを取得・表示
html = driver.page_source
print(html)
# ブラウザーを終了
driver.quit()
no-sandbox が無いとエラーが出る
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
firefox
firefoxをインスコ
aws lightsail だと
sudo amazon-linux-extras install firefox
をして、インスコ
続いてコード
hoge.py
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.service import Service
from webdriver_manager.firefox import GeckoDriverManager
import time
options = webdriver.FirefoxOptions()
options.add_argument("--headless")
options.add_argument("--no-sandbox")
serv = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=serv, options=options)
wait = WebDriverWait(driver, 10)
# 確認くんにアクセス
driver.get("https://www.ugtop.com/spill.shtml")
# スクショを取得
driver.get_screenshot_as_file('/var/www/html/screenshot3.png')
# ブラウザーを終了
driver.quit()