More than 5 years have passed since last update.

【python selenium】Google検索結果をスクレイピング後タイトルとURLをcsv出力

Posted at 2020-05-31

環境

macOS Catalina 10.15.3
Python 3.6.5

概要

任意のワードでGoogle検索し、その検索結果の一覧を任意のページ数まで取得
タイトルとurlをcsv出力する

方法（コピペOK）

※コピペOKですがコード内のディレクトリや検索ワードは任意で書き換えてください

# !python3
# google検索結果のタイトルとURLを取得してcsv出力

import time, chromedriver_binary, os, csv
from selenium import webdriver

output_path = "/最終的なcsv出力ディレクトリ
os.chdir(putput_path)                        

driver = webdriver.Chrome()                 # Chromeを準備

# htmlを開く
driver.get("https://www.google.com/")       # Googleを開く
search = driver.find_element_by_name("q")   # 検索ボックス"q"を指定する
search.send_keys(“xxx yyy zzz“)  # 検索ワードを送信
search.submit()                             # 検索を実行
time.sleep(3)                               # 3秒待機

def ranking(driver):
    i = 1 # 1で固定
    i_max = 10 # 何ページ目まで検索するか？
    title_list = []
    link_list = []

    #現在のページが指定した最大分析ページを超えるまでループする(i_max)
    while i <= i_max:
        # タイトルとリンクはclass="r"に入っている
        class_group = driver.find_elements_by_class_name("r")
        # class="r" からタイトルとリンクを抽出し,リストに追加するforループ
        for elem in class_group:
            title_list.append(elem.find_element_by_class_name('LC20lb').text)           # タイトル(class="LC20lb")
            link_list.append(elem.find_element_by_tag_name('a').get_attribute('href'))  # リンク(aタグのhref)

        #「次へ」のボタンはひとつしかないがあえてelementsで複数検索. 空のリストであれば最終ページという意味.
        if driver.find_elements_by_id("pnnext") == []:
            i = i_max + 1   # 次のページがなければ,最大ページ数を強制的に越してループ終了
        else:
            # 次ページのURLはid="pnnext"のhref属性
            next_page = driver.find_element_by_id("pnnext").get_attribute("href")
            driver.get(next_page)
            i = i + 1       # ページを進む
            time.sleep(3)   # 3秒休憩, これを指定の最大ページ数まで繰り返す
    return title_list, link_list 

# 上記で定義したranking関数を実行してタイトルとURLリストを取得する
title, link = ranking(driver)

# csvで吐き出すために[[a,1],[b,2]]みたいなリストを作成
result = [list(row) for row in zip(title, link)]

# resultを使ってcsv出力
with open("result.csv", mode="w", encoding="utf-8") as f:
    writer = csv.writer(f, lineterminator="\n")
    writer.writerows(result)

# ブラウザを閉じる
driver.quit()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

【python selenium】Google検索結果をスクレイピング後 タイトルとURLをcsv出力

環境

概要

方法（コピペOK）

【python selenium】Google検索結果をスクレイピング後タイトルとURLをcsv出力