YouTubeのコメントをGoogle Cloud Natural Language APIで感情解析してみる

Last updated at 2024-08-22Posted at 2024-08-22

はじめに

YouTube APIとGoogle Cloud Natural Language APIを使ってYouTube動画のコメントの感情解析をして、炎上を判定できるのかどうか試してみました。結果としてはそれなりに数値で炎上を判定できそうでした。コメント数が多いと解析にやたら時間がかかるので、気になる動画だけ解析して次のコンテンツ作成に反映する、という使い方ができそうです。

ライブラリ

pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client google-cloud-language

API設定

YouTube APIとGoogle Cloud Natural Language APIはGoogle Cloud ConsoleのAPI設定からそれぞれ有効化してください。

スクリプト

センチメントスコアをしきい値±0.25でPositive, Neutral, Negativeを決めている。ここを印象に合わせてチューニングしてけば、もう少し精度が上がるのかもしれない。

from googleapiclient.discovery import build
from google.cloud import language_v1
import os

# YouTube APIキー
# export YOUTUBE_API_KEY="YOUR_YOUTUBE_API_KEY" とOSで設定しておく必要がある
YOUTUBE_API_KEY = os.getenv('YOUTUBE_API_KEY')

# YouTubeのビデオID
VIDEO_ID = '[動画ID]' #URLから動画IDを抜き出す v=VIDEO_ID の部分

# YouTube APIクライアントを作成
youtube = build('youtube', 'v3', developerKey=YOUTUBE_API_KEY)

# Google Cloud Natural Language APIクライアントを作成
# export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json" と設定しておく必要がある
client = language_v1.LanguageServiceClient()

def get_comments(video_id):
    comments = []
    request = youtube.commentThreads().list(
        part="snippet",
        videoId=video_id,
        textFormat="plainText",
        maxResults=100
    )
    response = request.execute()

    while request is not None:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textDisplay']
            comments.append(comment)

        if 'nextPageToken' in response:
            request = youtube.commentThreads().list(
                part="snippet",
                videoId=video_id,
                textFormat="plainText",
                maxResults=100,
                pageToken=response['nextPageToken']
            )
            response = request.execute()
        else:
            request = None

    return comments

def analyze_sentiment(text):
    document = language_v1.Document(content=text, type_=language_v1.Document.Type.PLAIN_TEXT)
    sentiment = client.analyze_sentiment(request={'document': document}).document_sentiment
    return sentiment.score

def main():
    comments = get_comments(VIDEO_ID)
    positive, neutral, negative = 0, 0, 0

    for comment in comments:
        score = analyze_sentiment(comment)

        if score > 0.25:
            positive += 1
        elif score < -0.25:
            negative += 1
        else:
            neutral += 1

    total_comments = positive + neutral + negative
    print(f"Total Comments Num: {total_comments}")
    print(f"Positive[{positive}]: {positive/total_comments:.2%}")
    print(f"Neutral[{neutral}]: {neutral/total_comments:.2%}")
    print(f"Negative[{negative}]: {negative/total_comments:.2%}")

if __name__ == '__main__':
    main()

例１：炎上レベル高

最近株騒ぎで青汁王子さんが炎上してましたが、スコアを見るとNegativeが多い。まあ印象と同じ。

Positive: 22.54%
Neutral: 37.49%
Negative: 39.97%

例２：炎上レベル弱

テレビ出演に対するモチベーションがどうこう、みたいな話でプチ炎上していたキングコングさんの動画。Negativeなコメントの割合が例１の動画の半分くらい。まあそれなりにぷち炎上を表している。

Positive: 49.73%
Neutral: 33.61%
Negative: 16.67%

例３：普通の動画

Positive: 63.70%
Neutral: 27.64%
Negative: 8.66%

特に炎上の要素は見られないヒカルさんの最近の動画。Negativeの割合が一桁台で、まあきっと普通の動画なんだろうなと思う。ポジティブな意見が結構多い。面白かったということなのかもしれない。

まとめ

結果からすると今のしきい値設定において、

注目度が高い動画においてNegativeコメントの割合が15%を超えてくると炎上している感じがでてくる
40%とかはかなり炎上している

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up