Pythonでスクレイピングしてホロカの画像をすべて収集

Posted at 2024-11-26

友人とプログラミング学習のために毎週通話しています。

その中で友人はホロライブカードゲームの一人回し用のツールを作ることを目標としています。

私はそのお手伝いとしてそれに使う画像ファイルをスクレイピングするためのコードを書いたので今日はまとめておこうと思います。

まずはHTMLの解析ツールをインストールします。

pip install requests beautifulsoup4

その後、以下のファイルをコピペして実行します。
すると、"hololive_card_images"というフォルダが作成され、そこに205枚の現状全てのカードが保存されます。

import os
import requests
from bs4 import BeautifulSoup

# 画像を保存するフォルダを作成
output_folder = "hololive_card_images"
os.makedirs(output_folder, exist_ok=True)

# ベースURL
base_url = "https://hololive-official-cardgame.com/cardlist/"
img_url = "https://hololive-official-cardgame.com"

# カードIDの範囲
start_id = 1
end_id = 205

# User-Agentヘッダー
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}

for card_id in range(start_id, end_id + 1):
    try:
        # カードリストのURLを取得
        url = f"{base_url}?id={card_id}&view=text"
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # HTTPエラーをチェック

        # HTMLを解析
        soup = BeautifulSoup(response.text, "html.parser")

        # 画像URLを取得
        image_tag = soup.find("img", src=lambda x: x and "/wp-content/images/cardlist/" in x)
        if not image_tag:
            print(f"Card ID {card_id} - Image tag not found.")
            continue

        image_url = img_url + image_tag["src"]

        # 画像をダウンロード
        image_response = requests.get(image_url, stream=True)
        image_response.raise_for_status()

        # ファイル名を設定
        file_name = os.path.join(output_folder, f"card_{card_id}.jpg")
        with open(file_name, "wb") as file:
            for chunk in image_response.iter_content(1024):
                file.write(chunk)

        print(f"Card ID {card_id} - Downloaded: {file_name}")

    except Exception as e:
        print(f"Card ID {card_id} - Failed: {e}")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up