More than 3 years have passed since last update.

MISIAの国歌独唱に感動したのでその時のツイートをWordCloudでまとめてみた

Last updated at 2021-08-07Posted at 2021-07-31

　この間のオリンピック開会式のMISIAの国歌独唱が最高だったので、Twitter上でどんな反応があったのかWordCloudでまとめてみました。

　今回使用する主なライブラリはtweepy, janome, WordCloudになります。

　tweepyを使用するのにはAPIキー、トークンが必要になるので、以下の記事を参考にして取得してください。

　2021年度版 Twitter API利用申請の例文からAPIキーの取得まで詳しく解説

ライブラリのインポート

import re
import pandas as pd
import tweepy
from janome.tokenizer import Tokenizer
from wordcloud import WordCloud

tweepyの認証設定

api_key = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
api_key_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
access_token= "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
access_token_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

tweetの取得
tweepy.Cursorでは一度で最大100件のツイートを取得可能で、Twitter APIでは15分間に180回までの制限があるので、取得できるツイートは最大でも15分で1万8千件になります。
今回の場合、開会式でのMISIAの歌唱の時間が2021/7/23 20:18～20:20ぐらいだったので、その時間を指定しています。
(※今回は指定時刻の2分間で約1万5千件のツイートがあります。さすがオリンピック開会式・・・)

q = "MISIA -filter:retweets" #検索ワード:MISIA リツイートは含まない
since = "2021-07-23_20:18:00_JST" #指定開始時刻(日本時間)
until = "2021-07-23_20:20:00_JST" #指定終了時刻(日本時間)

tweet_data = []

for tweet in tweepy.Cursor(api.search, 
                           q=q, 
                           result_type='recent', 
                           lang='ja', 
                           tweet_mode='compat',
                           since=since,
                           until=until,
                           count=100).items(17900):
    tweet_data.append([tweet.text.replace('\n','')])

tweetの整形

t = Tokenizer()

words = []
for text in tweet_data:
    text = str(text)
    # ユーザー名削除
    text = re.sub(r'@[0-9a-zA-Z_:]*', "", text)
    # ハッシュタグ削除
    text = re.sub(r'#.*', "", text)
    # URL削除
    text = re.sub(r'(https?)(:\/\/[-_.!~*\'()a-zA-Z0-9;\/?:\@&=+\$,%#]+)', "", text)
    #記号の削除
    text=re.sub(r'[!-~]', "", text)
    text=re.sub(r'[︰-＠]', "", text)
    #改行の削除
    text=re.sub('\n', " ", text)

    tokens = t.tokenize(text)
    for token in tokens:
        pos = token.part_of_speech.split(',')[0]
        if pos == '名詞':
            words.append(token.surface)
        elif pos == "動詞":
            words.append(token.base_form)
        elif pos == "形容詞":
            words.append(token.base_form)

    text = ' '.join(words)

wordcloudで可視化(※文字の色はMISIAの衣装に合わせてレインボー)
出力された画像や前のブロックのwordsをsort_valuesして表示しない単語を設定するといいと思います。

# 表示しない単語の指定
stop_words = ["MISIA", "さん", "てる", "くる", "する", "なる", "そう", "れる", 
              "ない", "ある", "ーー", "ここ","しまう", "える", "よう", "いう", 
              "っぽい", "ちゃん", "ぇな"]

wordcloud = WordCloud(  background_color="white",  #背景：白
                        colormap = 'rainbow',      #文字のカラーマップ：レインボー
                        collocations = False,  #単語を連結するか
                        font_path='C:/Windows/Fonts/YuGothM.ttc',  #使用するフォント
                        width=600, height=400,  #画像サイズ
                        min_font_size=8, #一番小さいフォントサイズ
                        font_step=2,  #フォントサイズの間隔
                        stopwords=set(stop_words)
                        ).generate(text)

wordcloud.to_file('./wordcloud.png')

圧倒的な「かき氷」の多さ・・・

参考記事
TwitterAPIの上限までTweetを取得する｜つぶやきをWordcloudで可視化①

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up