More than 3 years have passed since last update.

pythonでgoogle検索結果をスクレイピング

Last updated at 2021-08-23Posted at 2021-08-23

pythonでgoogle検索結果をスクレイピングするときのメモです。

参照したHP
https://stackoverflow.com/questions/33587097/soup-select-r-a-in-https-www-google-com-q-vigilantemic-gives-empty-lis

【動作環境】
windows10
Spyder 4.2.5 (Anaconda)

【プログラム概要】
特定のキーワードにおけるgoogle検索結果のリンクアドレスを取得

import requests, lxml, webbrowser  
from bs4 import BeautifulSoup

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {'q': 'cyber security'}    #search keyword in this case "cyber security"

html = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(html, 'lxml')

# container with all needed data
for result in soup.select('.tF2Cxc'):
  link = result.select_one('.yuRUbf a')['href']
  print(link)

実行結果は次のようになり、キーワードの検索結果のそれぞれのHPリンクが取得できています。

https://ja.wikipedia.org/wiki/%E3%82%B5%E3%82%A4%E3%83%90%E3%83%BC%E3%82%BB%E3%82%AD%E3%83%A5%E3%83%AA%E3%83%86%E3%82%A3
https://cybersecurity-jp.com/
https://www.cscloud.co.jp/
https://eow.alc.co.jp/search?q=cybersecurity
https://www.nisc.go.jp/
https://www.events.great.gov.uk/ehome/innovation-jpn/cyber-security/
https://ejje.weblio.jp/content/cybersecurity
https://ejje.weblio.jp/content/cyber+security
https://jpn.nec.com/ncsp/index.html
https://argus-sec.com/ja/

（解説）
params = {'q': 'cyber security'} の部分が検索キーワードになります。
今回は "cyber security"　です。
変更したい場合はこの部分を変更してください。

select('.tF2Cxc')
select_one('.yuRUbf a')['href']
については、".tF2Cxc"は検索結果情報、".yuRUbf a"はリンク情報になります。

参考リンク）https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up