More than 3 years have passed since last update.

[Python] Google検索結果総数を取得する方法

Posted at 2022-04-11

Chromeでとある検索結果の総数を取得する方法
方法を検索してもパッと出なかったのでこちらでまとめておきます。
例えば、以下はサイトURLでの検索総数を取得する方法です。

実行環境

Python 3.9
pip3 22.0.4

from urllib.parse import urlparse
from numpy import result_type
import requests
from bs4 import BeautifulSoup
import time

def count_index():
    """ サイトURLのインデックス数を求める
    > site:<ドメイン名>で検索しクロールした結果の概算検索結果を出力件数
    """

    # リクエストURL
    target_url = 'https://example.com'
    target_domain = urlparse(target_url).netloc

    # Chrome情報
    search_url = f'https://www.google.com/search?q=site%3A{target_domain}'
    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36"}
    
    # リクエスト
    response = requests.get(search_url,headers=headers)
    response.encoding = response.apparent_encoding
    site_info = BeautifulSoup(response.text, 'html.parser')

    # 抽出
    total_results_text = site_info.find("div", {"id": "result-stats"}).find(text=True, recursive=False)
    results_num = ''.join([num for num in total_results_text if num.isdigit()])
    print(results_num) // 例) 632

count_index()

検索結果総数は「id="result-stats"」にあります。
注意点は、headerを含めないと上の要素を抽出できません。
また、headerは自身の環境で検索した情報をセットしないと検索結果総数が異なる場合があります。
その場合、headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36"}
の"Mozilla/5.0 ~ Safari/537.36"の中身を自身の環境に合わせると同じ数値が返ってくると思います。

環境は下記URLにアクセスすると確認できます。
(User-Agentは下記画像のブラウザのユーザーエージェント HTTP_USER_AGENTで取得できます)
https://testpage.jp/tool/ip_user_agent.php

↑ココ

つまった方は参考にしてみてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up