More than 1 year has passed since last update.

スクレイピングするとChecking if the site connection is secureがでて欲しいHTMLがスクレイピングできない

Posted at 2023-05-02

はじめに

スクレイピングを使ってアプリを開発しているときにつまづいたのでまとめていきます
今回はPythonを使っている場合の解決策をのせます

問題

BeautifulSoupでスクレイピングをしたところ

Checking if the site connection is secure

といった内容のHTMLを取得できるのですが、お目当てのページはスクレイピングできませんでした

解決方法

ウェブサイトは、Cloudflareを使用しており、自動的なスクレイピングやボットからのアクセスを防止するためにセキュリティチェックを実行しているようです

そこでcloudscraperを利用することで回避しました

$ pip install cloudscraper

main.py

import cloudscraper

scraper = cloudscraper.create_scraper()  # Create a cloudscraper instance
url = "https://example.com"  # Replace with the URL you want to scrape

response = scraper.get(url)

if response.status_code == 200:
    html_content = response.text
    print(html_content)
else:
    print(f"Error: {response.status_code}")

おわりに

ChatGPTにきいて解決したものですが、メモ程度にまとめとこうと思いました
Qiitaに書く必要ないのではと最近思い始めてきています。
ChatGPT本当にすごい、、

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up