LoginSignup
7
6

More than 3 years have passed since last update.

AWS cloud9でchromedriverを使いスクレイピングを行う。

Posted at

環境

ec2-user: ~ $ cat /etc/system-release
Amazon Linux AMI release 2018.03
ec2-user:~ $ python -V
Python 3.6.8
ec2-user:~ $ yum info google-chrome-stable
Loaded plugins: priorities, update-motd, upgrade-helper
google-chrome/primary                             | 1.7 kB     00:00     
google-chrome                                                        3/3
1065 packages excluded due to repository priority protections
Installed Packages
Name        : google-chrome-stable
Arch        : x86_64
Version     : 78.0.3904.108
Release     : 1
Size        : 214 M
Repo        : installed
Summary     : Google Chrome
URL         : https://chrome.google.com/
License     : Multiple, see https://chrome.google.com/
Description : The web browser from Google
            : 
            : Google Chrome is a browser that combines a minimal design
            : with sophisticated technology to make the web faster,
            : safer, and easier.

ec2-user:~ $ chromedriver --version
ChromeDriver 78.0.3904.70 (edb9c9f3de0247fd912a77b7f6cae7447f6d3ad5-refs/branch-heads/3904@{#800})

手順

AWSでchromedriverを動かそうと思い、以下の手順でチャレンジしてみました。
1. pipでseleniumをインストールする。
2. /usr/local/binにchromedriverを置く。
3. /usr/bin/にgoogle-chrome-stableを置く。
4. chromedriverが動くかどうか、chromedriver.pyというファイルで確認する。
5.

pipでseleniumをインストールする。

ec2-user:~ $ pip install selenium
Requirement already satisfied: selenium in ./.local/lib/python3.6/site-packages (3.141.0)
Requirement already satisfied: urllib3 in ./.local/lib/python3.6/site-packages (from selenium) (1.25.7)

/usr/local/binにchromedriverを置く。

ec2-user:/usr/local/bin $ sudo wget https://chromedriver.storage.googleapis.com/78.0.3904.70/chromedriver_linux64.zip
Resolving chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)... 172.217.26.16, 2404:6800:4004:809::2010
Connecting to chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)|172.217.26.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5527671 (5.3M) [application/zip]
Saving to: ‘chromedriver_linux64.zip’

chromedriver_linux64.zip   100%[=======================================>]   5.27M  --.-KB/s    in 0.08s  
(62.9 MB/s) - ‘chromedriver_linux64.zip’ saved [5527671/5527671]
ec2-user:/usr/local/bin $ sudo unzip chromedriver_linux64.zip                                              
Archive:  chromedriver_linux64.zip
  inflating: chromedriver
ec2-user:/usr/local/bin $ sudo rm -rf chromedriver_linux64.zip

chromedriverのバージョンを確認する。

ec2-user:~ $ chromedriver --version
ChromeDriver 78.0.3904.70 (edb9c9f3de0247fd912a77b7f6cae7447f6d3ad5-refs/branch-heads/3904@{#800})

/usr/bin/にgoogle-chrome-stableを置く。

ec2-user:~/environment $ curl https://intoli.com/install-google-chrome.sh | bash
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9526  100  9526    0     0  11877      0 --:--:-- --:--:-- --:--:-- 11863
Working in /tmp/google-chrome-installation
/tmp/google-chrome-installation /home/ec2-user/environment
Configuring the Google Chrome repo in /etc/yum.repos.d/google-chrome.repo
Loaded plugins: priorities, update-motd, upgrade-helper
google-chrome                                                                     | 1.3 kB  00:00:00     
1065 packages excluded due to repository priority protections
Package wget-1.18-5.30.amzn1.x86_64 already installed and latest version
Nothing to do
https://dl.google.com/linux/linux_signing_key.pub
Resolving dl.google.com (dl.google.com)... 216.58.197.174, 2404:6800:4004:801::200e
Connecting to dl.google.com (dl.google.com)|216.58.197.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10218 (10.0K) [application/octet-stream]
Saving to: ‘linux_signing_key.pub’

linux_signing_key.pub      100%[=====================================>]   9.98K  --.-KB/s    in 0s      

(80.2 MB/s) - ‘linux_signing_key.pub’ saved [10218/10218]

Attempting a direction installation with yum.
Loaded plugins: priorities, update-motd, upgrade-helper
1065 packages excluded due to repository priority protections
Package google-chrome-stable-78.0.3904.108-1.x86_64 already installed and latest version
Nothing to do
Successfully installed Google Chrome!

google-chrome-stableの情報を確認してみると、

ec2-user:~ $ yum info google-chrome-stable
Loaded plugins: priorities, update-motd, upgrade-helper
google-chrome/primary                             | 1.7 kB     00:00     
google-chrome                                                        3/3
1065 packages excluded due to repository priority protections
Installed Packages
Name        : google-chrome-stable
Arch        : x86_64
Version     : 78.0.3904.108
Release     : 1
Size        : 214 M
Repo        : installed
Summary     : Google Chrome
URL         : https://chrome.google.com/
License     : Multiple, see https://chrome.google.com/
Description : The web browser from Google
            : 
            : Google Chrome is a browser that combines a minimal design
            : with sophisticated technology to make the web faster,
            : safer, and easier.

chromedriverが動くかどうか、chromedriver.pyというファイルで確認する。

chromeriver.py
from time import sleep
from selenium import webdriver

browser = webdriver.Chrome()

browser.get("https://www.google.com")

browser.save_screenshot("screen.png")

sleep(5)

browser.close()

ターミナルで実行してみると、

ec2-user:~/environment/ch2 $ python3 chromedriver.py
Traceback (most recent call last):
  File "chromedriver.py", line 4, in <module>
    browser = webdriver.Chrome()
  File "/home/ec2-user/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
    desired_capabilities=desired_capabilities)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

このエラーコードの原因は、$ cp google-chrome-stable google-chromeで作った、google-chromeがうまく機能していないことだと考えています。(<-違ったらごめんなさい🙇)
現在のgoogle-chrome-stableの状態はというと、

ec2-user:/usr/bin $ google-chrome-stable
[4784:4784:1122/091149.992493:ERROR:browser_dm_token_storage_linux.cc(100)] Error: /etc/machine-id contains 0 characters (32 were expected).

(google-chrome-stable:4784): Gtk-WARNING **: 09:11:50.081: cannot open display:

1つ目のError: /etc/machine-id contains 0 characters (32 were expected).に関しては、

ec2-user:/var/lib/dbus $ cat machine-id
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

ここのmachine-idをコピーして

ec2-user: $ cd /etc
ec2-user:/etc $ sudo touch machine-id
ec2-user:/etc $ vi machine-id

で、aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaをペーストして、:wqで保存しました。
権限の問題がまだ残っていたので

ec2-user:/etc $ sudo chmod 766 machine-id    

と入力しました。

すると、一番目のエラーが消えました。

ec2-user:/var/lib/dbus $ google-chrome-stable

(google-chrome-stable:7870): Gtk-WARNING **: 04:39:41.689: cannot open display: 

このエラーは、「AWS cloud9でGUIではなく、CUIでgoogle-chromeを動かしましょう」
ということを意味してるのかな、と思い、
CUIでgoogle-chromeを動かす方法を考えています。
思いついた方法としては、

  • headlessでgoogle-chromeを動かす。
    ー>まだ、うまくいってません。

  • xvfbと呼ばれるxserverを使って、google-chromeを動かす。

There is an X server called Xvfb which provides a valid DISPLAY and sends the output to a file instead of to graphics hardware.
Using GTK without DISPLAY

  • Amazon linux2に移行して、$ sudo amazon-linux-extrasを使えるようにする。

Amazon Linux 2 を実行している Amazon EC2 インスタンスにグラフィカルユーザーインターフェイス (GUI) をインストールしたいと考えています。どうすればよいですか?

まだ解決できていない問題

google-chrome-stableのところのエラー

ec2-user:/usr/bin $ google-chrome-stable
(google-chrome-stable:6140): Gtk-WARNING **: 03:42:32.602: cannot open display:

お願い

解決方法のわかる方、どんな些細なことでもよいので、
助言してくださるとうれしいです!!

7
6
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
6