GoogleCloudFunctionsでSeleniumを使う

Last updated at 2022-01-30Posted at 2020-03-11

はじめに

GoogleCloudFunctionsでSeleniumを使う必要があり、こちらを参考にさせていただきました。
ただGoogleCloudFunctionsの仕様が変わってしまったのか、ひと手間必要だったので記事にまとめてみました。

headlessChromeを用意する

chromedriverとheadless-chromiumを取得します。

https://github.com/ryfeus/gcf-packs
こちらをダウンロードして解凍します。

下の画像の通りselenium_chromeフォルダ→sourceフォルダにファイルがあります。
※headless-chromiumはsourceフォルダにあるheadless-chromium.zipを解凍します。

GoogleCloudFunctionsにデプロイするフォルダを用意する

chromedriver、headless-chromium、main.py、requirements.txtをフォルダに入れます。
※python以外の言語を使用する場合、適宜変更してください。

GoogleCloudFunctionsでSeleniumを使用する

chromedriverとheadless-chromiumを別フォルダへコピーする

今回の一番のポイントです。
GoogleCloudFunctionsへデプロイすると、chromedriverなどのファイルはカレントディレクトリ"/user_code"にあります。ただしこの配下のファイルにはアクセスできません。代わりに"/tmp"フォルダであればユーザーが自由にアクセスできます。

つまりchromedriver・headless-chromiumを"/tmp"フォルダへコピーする必要があります。
またファイル権限を変更します。ファイル権限の変更についてはこちらを参考にしました。

ファイル権限の変更する関数

def add_execute_permission(path: Path, target: str = "u"):
    """Add `x` (`execute`) permission to specified targets."""
    mode_map = {
        "u": stat.S_IXUSR,
        "g": stat.S_IXGRP,
        "o": stat.S_IXOTH,
        "a": stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH,
    }

    mode = path.stat().st_mode
    for t in target:
        mode |= mode_map[t]

    path.chmod(mode)

カレントディレクトリから"/tmp"へコピー&権限の変更

    driverPath = "/tmp" + "/chromedriver"
    headlessPath = "/tmp" + "/headless-chromium"

    # copy and change permission
    print("copy headless-chromium")
    shutil.copyfile(os.getcwd() + "/headless-chromium", headlessPath)
    add_execute_permission(Path(headlessPath), "ug")

    print("copy chromedriver")
    shutil.copyfile(os.getcwd() + "/chromedriver", driverPath)
    add_execute_permission(Path(driverPath), "ug")

Chromeのオプション

https://github.com/ryfeus/gcf-packs
こちらのmain.pyを参考に設定しましたが、"user-agent=" + UserAgent().random"は必要ないようです。またdisable-dev-shm-usageを設定しています。
※あまり詳しく調べていません・・・

プログラムソース・設定ファイル

上記の内容を踏まえたプログラムソースです。

※(2022/1/30更新) hidemanさんよりサンプルコードが動かないよ！とご指摘いただきました。
　以前と設定が変わってしまったようです…修正版に差し替えました。hidemanさんありがとうございます！

main.py

import os
import shutil
import stat
from pathlib import Path

from selenium import webdriver

global driver


def add_execute_permission(path: Path, target: str = "u"):
    """Add `x` (`execute`) permission to specified targets."""
    mode_map = {
        "u": stat.S_IXUSR,
        "g": stat.S_IXGRP,
        "o": stat.S_IXOTH,
        "a": stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH,
    }

    mode = path.stat().st_mode
    for t in target:
        mode |= mode_map[t]

    path.chmod(mode)


def settingDriver():
    print("driver setting")
    global driver

    driverPath = "/tmp" + "/chromedriver"
    headlessPath = "/tmp" + "/headless-chromium"

    # copy and change permission
    print("copy headless-chromium")
    shutil.copyfile(os.getcwd() + "/headless-chromium", headlessPath)
    add_execute_permission(Path(headlessPath), "ug")

    print("copy chromedriver")
    shutil.copyfile(os.getcwd() + "/chromedriver", driverPath)
    add_execute_permission(Path(driverPath), "ug")

    chrome_options = webdriver.ChromeOptions()

    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1280x1696")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--hide-scrollbars")
    chrome_options.add_argument("--enable-logging")
    chrome_options.add_argument("--log-level=0")
    chrome_options.add_argument("--v=99")
    chrome_options.add_argument("--single-process")
    chrome_options.add_argument("--ignore-certificate-errors")
    chrome_options.add_argument("--disable-dev-shm-usage")

    chrome_options.binary_location = headlessPath

    print("get driver")
    driver = webdriver.Chrome(executable_path=driverPath, options=chrome_options)


def seleniumSample(request):

    settingDriver()

    global driver
    try:
        print("URL get")
        driver.get("http://[hoge]")
        targetPath = '[HOGE]'
        ret = driver.find_element_by_xpath(targetPath)
        ret = ret.get_attribute("alt")
        print(ret)

    finally:
        print("driver quit")
        driver.quit()

    return ret

またデプロイ時にインストールするパッケージをrequirements.txtに記載します。

requirements.txt

google-cloud-error-reporting==0.30.0
selenium
setuptools

まとめ

ユーザーが触れるフォルダへheadlessChrome等をコピーすることでSeleniumを実行できました。
ちょっとしたスクレイピングであればCloudFunctionsで実行するのもひとつの手段になるかと思います。

何か参考になれば幸いです。ではでは。

参考

Google Cloud FunctionsでPython+seleniumでスクレイピングしてみる
 [GoogleCloudFunctions(GCF) Python] GCFのOSや挙動に関するメモ
 Python Tips：Python でファイルに権限を追加したい

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up