More than 1 year has passed since last update.

スクレイピングで取得した画像をGoogleドライブに保存する

Last updated at 2023-04-08Posted at 2023-04-07

やりたいこと

スクレイピングで取得した画像データを直接googleドライブに保存したい

最初に

google drive APIを有効にして、PyDriveライブラリを使っていきます。この辺の手順は詳しいサイトがあるので、そちらを参照してください。参考にいくつか挙げておきます。

googleドライブに保存

試しに映画ドットコムからポスター画像を取得し、googleドライブに保存してみます。

rfrom pydrive.auth import GoogleAuth
import bs4
import io
import json
import requests

gauth = GoogleAuth()
gauth.LocalWebserverAuth()

url = 'https://eiga.com/'
responce = requests.get(url)
irasutoya = bs4.BeautifulSoup(responce.text, 'html.parser')
img_list = irasutoya.select("[class='img-thumb h214']")

folder_id = "xxxxxxxxxxxxxxxxxxxx"
for i, img in enumerate(img_list):
    src = img.find("img").get("src")
    img_binary = requests.get(src)

    filename = '映画ポスター' + str(i)
    metadata = {
        "name": filename,
        "parents": [folder_id]
    }
    files = {
        'data': ('metadata', json.dumps(metadata), 'application/json'),
        'file': io.BytesIO(img_binary.content)
    }
    r = requests.post(
        "https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
        headers={"Authorization": "Bearer " + gauth.credentials.access_token},
        files=files
    )

ここでポイントになるのは以下の部分です。

metadata = {
    "name": filename,
    "parents": [folder_id]
}
files = {
    'data': ('metadata', json.dumps(metadata), 'application/json'),
    'file': io.BytesIO(img_binary.content)
}
r = requests.post(
    "https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart",
    headers={"Authorization": "Bearer " + gauth.credentials.access_token},
    files=files
)

画像データをバイナリ形式に変換してから、エンドポイントにPOSTリクエストを送信することでアップロードしています。metadataで指定しているfolder_idはURLの「…/drive/u/1/folders/{フォルダID}」から取得できます。

実行するとこんな感じでgoogleドライブに画像が保存できます。

参考サイト

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up