Docker環境でスクレイピングしたい
下記のファイルを実行しようとしたところエラーが発生しました。
main.py
from selenium import webdriver
import chromedriver_binary
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome import service as fs
URL = 'https://google.com/'
service = fs.Service('~/opt/chrome/chromedriver')
options = Options()
options.add_argument(f'service={service}')
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=options)
driver.get(URL)
html = driver.page_source
print(html)
エラーメッセージ(クリックしてください)
root@********:/app# python main.py
Traceback (most recent call last):
File "/app/main.py", line 16, in <module>
driver = webdriver.Chrome(options=options)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in __init__
RemoteWebDriver.__init__(
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 275, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 365, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 430, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed.
(chrome not reachable)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x004000664c33 <unknown>
#1 0x0040003ae158 <unknown>
#2 0x0040003d16ea <unknown>
#3 0x0040003ccdea <unknown>
#4 0x004000407dfa <unknown>
#5 0x004000401f23 <unknown>
#6 0x0040003d78aa <unknown>
#7 0x0040003d8a05 <unknown>
#8 0x0040006a91ed <unknown>
#9 0x0040006ad24e <unknown>
#10 0x00400069348e <unknown>
#11 0x0040006adf88 <unknown>
#12 0x004000688630 <unknown>
#13 0x0040006ca308 <unknown>
#14 0x0040006ca488 <unknown>
#15 0x0040006e483d <unknown>
#16 0x004002413ea7 <unknown>
試したこと
GoogleChromeとChromeDriverのバージョンは合わせました。
Dockerfile(クリックしてください)
FROM python:3.9
USER root
RUN apt update && apt install -y \
sudo \
wget \
curl \
htop \
vim \
git \
gnupg \
unzip \
tzdata \
locales && \
locale-gen ja_JP.UTF-8
RUN dpkg --add-architecture amd64 \
&& dpkg --print-foreign-architectures
# google-chrome
RUN apt update \
&& wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& apt install -y ./google-chrome-stable_current_amd64.deb \
&& apt clean \
&& rm -rf /var/lib/apt/lists/ \
&& rm google-chrome-stable_current_amd64.deb
# ChromeDriver
ADD https://chromedriver.storage.googleapis.com/102.0.5005.27/chromedriver_linux64.zip /opt/chrome/
RUN cd /opt/chrome/ && \
unzip chromedriver_linux64.zip && \
rm chromedriver_linux64.zip
# python package
RUN pip install --upgrade pip
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
WORKDIR /app/
COPY main.py .
ENV PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/chrome
ENV TZ Asia/Tokyo
ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
requirement.txt
chromedriver-binary==102.0.5005.27
ChromeDriver 102.0.5005.27 (df4a85108ffad4dca2c409c52f24df7ec0204b91-refs/branch-heads/5005_22@{#4})
Google Chrome 102.0.5005.115
エラーメッセージにクラッシュしているとあったので、
調べると下記のリンクに行きつき、試しても同様のエラーが発生しました。
他にも色々試したのですがこれといった成果を得られませんでした。
どこがおかしいのかこれ以上見当がつかないので、ここ調べたらいいよってところを教えてください。。。
0