Youtube Data APIで動画の情報を収集

Last updated at 2024-07-29Posted at 2024-06-08

はじめに

あるキーワードでヒットするYoutubeの動画の再生数やらいいね数やらを一気に取得したいと思ってたら、Youtube Data APIを使ってそういったことができるらしいので試してみた。
なお、APIの実行はPythonで行った。

準備

APIキーの発行

APIを実行する他にはAPIキーを取得しておく必要がある
APIキーの取得は以下の通り

Google Cloud Platformにログイン
https://console.cloud.google.com/welcome/new
からGCPにログインする
新規プロジェクトを作成する

上部の「プロジェクトの選択」を開く
「新規プロジェクト」をクリック
プロジェクト名を適当に入力して作成をクリック
数秒待つ。通知にプロジェクト作成完了のお知らせが届くので、「プロジェクトを選択」をクリックする。
プロジェクトが切り替わったら、クイックアクセスから「APIとサービス」を選択
もしくは左上の「三」ボタンから「全てのプロダクトを表示」を選択して「APIとサービス」をクリック
メニューから「ライブラリ」を選択
ライブラリの検索欄に「youtube」と入力
いくつかヒットするが、「YouTube Data API v3」を選択
管理をクリック
左のメニューから「認証情報」をクリック
「認証情報を作成」から「APIキー」をクリック
APIキーが発行される
最後にキーの利用に制限をかける
作成したAPIキーを選択して、キーを制限にチェックを入れて、「YouTube Data API v3」を選択する

必要ライブラリをインストールしておく

ターミナルで以下を実行

pip install google-api-python-client

PythonでAPIの実行

#　モジュールのインポート
from apiclient.discovery import build
from apiclient.errors import HttpError
import pandas as pd

# API情報
DEVELOPER_KEY = 'xxxxxxxxxxxxx'   # ←準備で用意した各自のAPIキーを入力
YOUTUBE_API_SERVICE_NAME = 'youtube'
YOUTUBE_API_VERSION = 'v3'

youtube = build(
    YOUTUBE_API_SERVICE_NAME,
    YOUTUBE_API_VERSION,
    developerKey=DEVELOPER_KEY
    )


#検索ワードで検索を実行
search_response = youtube.search().list(
  q='QGIS',
  part='id,snippet',
  order='date',
  type='video',
  maxResults=50,
).execute()

search_response

youtube.search() で動画の検索を実行している。
パラメータの意味は次のとおり

q：お好きな検索ワードを指定
part：取得する動画の情報内容
order：検索結果の並び順を次の中から指定
- date - 作成日が新しい順
- rating - 評価の高い順
- relevance – 検索ワードとの関連性の高さ順。これがデフォルト値
- title – タイトルのアルファベット順
- videoCount - チャンネルは、アップロードされた動画の数の多い順
- viewCount – 閲覧数の多い順
type：取得するリソースのタイプを{ video,channel,playlist }で指定
maxResults：取得する検索結果件数。デフォルトは5。最大50

次のようなパラメータも指定できる

channelId：チャンネルIDを指定して検索
location：アップロードの地域を限定する。入力は緯度経度の座標（例: 37.42307,-122.08427）
locationRadius：locationで指定した地点からの半径
publishedAfter：指定した日付以降の動画のみ取得
publishedBefore：指定した日付より前の動画のみ取得

そのほかのパラメータについては公式ドキュメントを参照ください

実行結果
次のとおりJSON形式で結果が得られる

{'kind': 'youtube#searchListResponse',
 'etag': 'IuHR81Mo6AwtQhc4GQu_adxKAes',
 'nextPageToken': 'CAoQAA',
 'regionCode': 'JP',
 'pageInfo': {'totalResults': 679060, 'resultsPerPage': 10},
 'items': [{'kind': 'youtube#searchResult',
   'etag': '85EniYlwOerue-GVw9QxAaMRkdc',
   'id': {'kind': 'youtube#video', 'videoId': 'NHolzMgaqwE'},
   'snippet': {'publishedAt': '2020-07-16T21:53:01Z',
    'channelId': 'UChpH97EqaN9HmdMnQFl7-Kw',
    'title': 'An Absolute Beginner&#39;s Guide to QGIS 3',
    'description': "This tutorial is an absolute beginner's guide to QGIS 3. If you are just diving into QGIS and interested in picking up QGIS through ...",
    'thumbnails': {'default': {'url': 'https://i.ytimg.com/vi/NHolzMgaqwE/default.jpg',
      'width': 120,
      'height': 90},
     'medium': {'url': 'https://i.ytimg.com/vi/NHolzMgaqwE/mqdefault.jpg',
      'width': 320,
      'height': 180},
     'high': {'url': 'https://i.ytimg.com/vi/NHolzMgaqwE/hqdefault.jpg',
      'width': 480,
      'height': 360}},
    'channelTitle': 'GeoDelta Labs',
    'liveBroadcastContent': 'none',
    'publishTime': '2020-07-16T21:53:01Z'}},
  {'kind': 'youtube#searchResult',
   'etag': '5-vpVxzBrNledULTt2isHrP0nzA',
   ・・・
   ・・・
   ・・・

itemsに動画に関するいろんな情報が入っている。たとえば

items.id.videoId：動画の一意のID (このIDをURLで指定するとその動画のページにいける
f.g. https://www.youtube.com/watch?v=NHolzMgaqwE)
items.snippet
- channelTitle：チャンネル名
- publishedAt：公開日
- title：タイトル
- description：概要

pandasで表形式に整形

df = pd.DataFrame(search_response["items"])
#各動画毎のvideoIdを取得
df1 = pd.DataFrame(list(df['id']))['videoId']
#各動画毎の動画情報取得
df2 = pd.DataFrame(list(df['snippet']))[['channelTitle','publishedAt','channelId','title','description']]
# df1とdf2を結合
df3 = pd.concat([df1,df2], axis = 1)

df3

実行結果

50件以上の動画を取得したいとき

APIの実行パラメータのmaxResultsで取得件数を指定できるが、最大で５０まで。それ以上の件数取得したい場合は、繰り返し処理するなどするといいようです。たとえば

#youtubeデータをpandasに変換する関数
def get_video_info(part, q, order, type, num):
    dic_list = []
    search_response = youtube.search().list(part=part,q=q,order=order,type=type)
    output = youtube.search().list(
        part=part,
        q=q,
        order=order,
        type=type,
        maxResults=50
        ).execute()


    #numの回数だけ繰り返し取得
    for i in range(num):
        dic_list = dic_list + output['items']
        search_response = youtube.search().list_next(search_response, output)
        output = search_response.execute()

    df = pd.DataFrame(dic_list)
    #各動画毎に一意のvideoIdを取得
    df1 = pd.DataFrame(list(df['id']))['videoId']
    #各動画毎に一意のvideoIdを取得必要な動画情報だけ取得
    df2 = pd.DataFrame(list(df['snippet']))[['channelTitle','publishedAt','channelId','title','description']]
    df3 = pd.concat([df1,df2], axis = 1)

    return df3

#データ取得をnum回実行
df_out = get_video_info(part='snippet',q='QGIS',order='date',type='video',num = 20)

APIの実行処理をget_video_info関数で定義して、それを必要回数実行しています。

チャンネル名を指定して動画を取得

たとえば、MIERUNEチャンネルが公開しているQGIS関連の動画は２７本ある。

しかし、上のコードでQGISに関する動画を2017年4月まで遡って取得することができたMIERUNEチャンネルで公開している動画はたったの６本。

単純に「QGIS」というキーワードに引っ掛からなかっただけだろうか・・・。
特定のチャンネルの動画を取得たい場合はyoutube.search()のパラメータにchannelID = {channelID}をつけると、指定したチャンネルの動画を取得することができる。
たとえば、MIERUNEチャンネルのチャンネルIDはUCSjlE653fhsrXxfR_JoQwegなので、次のようになる

search_response = youtube.search().list(
	q = 'QGIS',
  part='id,snippet',
  order='viewCount',
  type='video',
  maxResults=50,
  channelId ="UCSjlE653fhsrXxfR_JoQweg"
).execute()

search_response

実行結果
ちゃんと全動画取得できた

それぞれの動画の再生数やいいね数を取得

youtube.search()だと動画のいいね数や閲覧数までは取得できない。
いいね数や閲覧数を取得した場合は、

youtube.videos().list(
    part = 'statistics,contentDetails',
    id = {videoID}
    )

で個別の動画の情報を取得可能。videoIDはyoutube.search()のレスポンスに格納されているのでそれを使う。
つまりyoutube.search()で検索に引っかかる動画のvideoIDを取得。
続いて取得したvideoIDでそれぞれの動画のいいね数や閲覧数を取得

実行結果はこんな感じ

items.contentDetailsに動画長さ（duration)、
items.statisticsに閲覧数（viewCount）やいいね数（likeCount）などが入っている。

一連の処理を一括でやるとこうなる。

#　youtubeデータをpandasに変換する関数
def get_video_info(part, q, order, type, num):
    dic_list = []
    search_response = youtube.search().list(part=part,q=q,order=order,type=type)
    output = youtube.search().list(
        part=part,
        q=q,
        order=order,
        type=type,
        maxResults=50
        ).execute()


    #numの回数だけ繰り返し取得
    for i in range(num):
        dic_list = dic_list + output['items']
        search_response = youtube.search().list_next(search_response, output)
        output = search_response.execute()

    df = pd.DataFrame(dic_list)
    #各動画毎に一意のvideoIdを取得
    df1 = pd.DataFrame(list(df['id']))['videoId']
    #各動画毎に一意のvideoIdを取得必要な動画情報だけ取得
    df2 = pd.DataFrame(list(df['snippet']))[['channelTitle','publishedAt','channelId','title','description']]
    df3 = pd.concat([df1,df2], axis = 1)

    return df3

#データ取得をnum回実行
df_out = get_video_info(part='snippet',q='QGIS',order='date',type='video',num = 20)


# いいね数などを取得する処理
def get_statistics(id):
    statistics = youtube.videos().list(part = 'statistics,contentDetails', id = id).execute()['items'][0]['statistics']
    return statistics

#再生回数や動画時間の情報をそれぞれの動画分繰り返し取得
df_static = pd.DataFrame(list(df_out['videoId'].apply(lambda x : get_statistics(x))))

#テーブルの結合
df_output = pd.concat([df_out,df_static], axis = 1)
df_output

無事、キーワードに関する動画といいね数、閲覧数などを取得することができました。

取得したデータの活用

取得した動画の情報を色々解析して、どんな動画がバズるのか調べてみると面白そうかもね。
例えばから以下のように動画の長さと閲覧数の関係性を調べてみたり。。。

動画の情報を色々調べた結果もいつかまとめるかもしれませんが、今回はここまで。

参考文献

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up