More than 3 years have passed since last update.

YouTube 動画コメント、リプライ含めて全部読む

Last updated at 2020-11-24Posted at 2020-10-11

YouTube動画のリプライ含めた全コメントをpythonスクリプトで取得してみた

YouTube動画コメント読めない問題

YouTube動画で、時々異様にたくさんのコメントがついてるものがありますが、これをブラウザやアプリから全部読もうと頑張ると

止まる。

そのうちぴたっと止まります。PCなめんなと久々に火を入れてもブラウザがぴたりと止まります。

YouTube API/スクリプトで読んでみる

そういうときにYouTube APIという便利なものがあるらしく、すでに公開されているスクリプトも多数あります。が、コメントリプライまで全部取得できるものは見つけられなかったので、リプライ含めて全コメントを取得できるようにしてみました。

スクリプト

getComments.py


import datetime
import json
import requests

URL = 'https://www.googleapis.com/youtube/v3/'
API_KEY = 'xxx'
VIDEO_ID = 'yyy'

def get_video_comment(video_id, n):
    try:
        next_page_token
    except NameError:
        params = {
            'key': API_KEY,
            'part': 'replies, snippet',
            'videoId': VIDEO_ID,
            'order': 'time',
            'textFormat': 'plaintext',
            'maxResults': n,
        }
    else:
        params = {
            'key': API_KEY,
            'part': 'replies, snippet',
            'videoId': VIDEO_ID,
            'order': 'time',
            'textFormat': 'plaintext',
            'pageToken': next_page_token,
            'maxResults': n,
        }
    response = requests.get(URL + 'commentThreads', params=params)
    resource = response.json()
    return resource

def print_video_comment_replies(match):
    for comment_info in match['items']:
        author = comment_info['snippet']['authorDisplayName']
        pubdate = comment_info['snippet']['publishedAt']
        text = comment_info['snippet']['textDisplay']
        pubdate = datetime.datetime.strptime(pubdate, '%Y-%m-%dT%H:%M:%SZ')
        pubdate = pubdate.strftime("%Y/%m/%d %H:%M:%S")
        print('\t___________________________________________\n')
        print("\n\tReply :\n\t{}\n\n\tby: {} date: {}".format(text, author, pubdate), "\n")

def print_video_comment(match):
    global parentId
    for comment_info in match['items']:
        parentId = comment_info['id']
        author = comment_info['snippet']['topLevelComment']['snippet']['authorDisplayName']
        pubdate = comment_info['snippet']['topLevelComment']['snippet']['publishedAt']
        text = comment_info['snippet']['topLevelComment']['snippet']['textDisplay']
        like_cnt = comment_info['snippet']['topLevelComment']['snippet']['likeCount']
        reply_cnt = comment_info['snippet']['totalReplyCount']
        pubdate = datetime.datetime.strptime(pubdate, '%Y-%m-%dT%H:%M:%SZ')
        pubdate = pubdate.strftime("%Y/%m/%d %H:%M:%S")
        print('---------------------------------------------------\n')
        print('{}\n\n by: {} date: {} good: {} reply: {}\n'.format(text, author, pubdate, like_cnt, reply_cnt))
        if reply_cnt > 0:
            replyMatch = treat_reply(match)
            print_video_comment_replies(replyMatch)
            global reply_next_page_token
            try:
                reply_next_page_token = replyMatch["nextPageToken"]
            except KeyError:
                pass
            else:
                while 'reply_next_page_token' in globals():
                    replyMatch = treat_reply(match)
                    print_video_comment_replies(replyMatch)
                    try:
                        reply_next_page_token = replyMatch["nextPageToken"]
                    except KeyError:
                        reply_next_page_token = None
                        del reply_next_page_token
            reply_next_page_token = None
            del reply_next_page_token

def treat_reply(match):
    for comment_info in match['items']:
        try:
            comment_info['replies']
        except KeyError:
            pass
        else:
            try:
                reply_next_page_token
            except NameError:
                params = {
                 'key': API_KEY,
                 'part': 'id, snippet',
                 'parentId': parentId,
                 'textFormat': 'plaintext',
                 'maxResults': 100,
                 'order': 'time',
                }
            else:
                params = {
                 'key': API_KEY,
                 'part': 'id, snippet',
                 'parentId': parentId,
                 'textFormat': 'plaintext',
                 'maxResults': 100,
                 'order': 'time',
                 'pageToken': reply_next_page_token,
                }
            response = requests.get(URL + 'comments', params=params)
            resource = response.json()
            return resource


# Main

key = None

while 'key' in locals():
    match = get_video_comment(VIDEO_ID, 100)
    print_video_comment(match)
    try:
        next_page_token = match["nextPageToken"]
    except KeyError:
        next_page_token = None
        del next_page_token
        del key

スクリプト内の変数API_KEYとVIDEO_IDはそれぞれ適当に設定してください。VIDEO_IDは動画URLのアレです。

ポイント

コメントはCommentThreadsで取得
コメントリプライはCommentsで取得

しているところです。まぁマニュアルに書いてある通りなんですが・・・・

To retrieve all of the replies for the top-level comment, you need to call the comments.list method and use the parentId request parameter to identify the comment for which you want to retrieve replies.

コメント->コメントリプライの切り替えはできたらmain処理のなかでやりたかったのですが、そうするためにはCommentThreadsのパラメータmaxResultsを1にするしか私には思いつかず、それはそれでコメント数分のAPI発行をするという馬鹿げた(というか場合によっては十分怒られる)ことになってしまいますので、print_video_comment()内で更にコメントリプライ用関数を呼び出すというちょっとイマイチな構造になってます。
変数reply_next_page_tokenは関数間で使い回すので、global宣言が必要です。(最初気づかずハマりました。)

その他

変数有無の確認方法がスクリプト内で統一されてないですが気にしないでください・・・・

参考

API Reference - CommentThreads
API Reference - Comments
Youtube のコメントを取得する in Python

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up