ルマンドの twitter 上での評判を調べてみた #自然言語処理

背景

ルマンドbot がルマンドを推してくるので，その評判が気になった．
ポジティブ/ネガティブなツイートを抽出し，よく使われるポジネガ表現を抜き出したい．

事前準備

以下ページから twitter の developer アカウントの用意
https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens
以下ページで Twitter Apps の作成
https://apps.twitter.com/

credential (Key と Access Tokens) を，それぞれ consumerkey, consumersecret, accesstoken, accesstokensecret という変数で保存しておきます．保存の形式は char 型です．

環境

Twitter からツイートの取得

Twitter オブジェクトの作成

twitter 関数を使用して，key と token を使って twitter API に接続します．

c = twitter(consumerkey,consumersecret,accesstoken,accesstokensecret);
c.StatusCode  % 接続チェック

「ルマンド」を含む直近の100ツイートを取得

tweetquery = 'ルマンド';
s = search(c,tweetquery,'count',100);
statuses = s.Body.Data.statuses;
pause(2)

sRefresh = search(c,tweetquery,'count',100, ...
    'since_id',s.Body.Data.search_metadata.max_id_str);
statuses = [statuses;sRefresh.Body.Data.statuses];
statuses{1}

% 一番最近の tweet 
statuses{1}.text
save statuses

可能な限りツイートを取得

while isfield(s.Body.Data.search_metadata,'next_results')
    % 結果を string に変換
    nextresults = string(s.Body.Data.search_metadata.next_results); 
    % Extract maximum Tweet identifier  
    max_id = extractBetween(nextresults,"max_id=","&");             
    % Convert maximum Tweet identifier to a character vector
    cmax_id = char(max_id);         
    % Search for Tweets                                
    s = search(c,tweetquery,'count',100,'max_id',cmax_id);        
    % Retrieve Tweet text for each Tweet
    statuses = [statuses;s.Body.Data.statuses];                     
end

if iscell(statuses)
    % Unstructured data
    numTweets = length(statuses);             % Determine total number of Tweets
    tweetTimes = cell(numTweets,1);           % Allocate space for Tweet times and Tweet text
    tweetTexts = tweetTimes; 
    for i = 1:numTweets
      tweetTimes{i} = statuses{i}.created_at; % Retrieve the time each Tweet was created
      tweetTexts{i} = statuses{i}.text;       % Retrieve the text of each Tweet
    end
else
    % Structured data
    tweetTimes = {statuses.created_at}'; 
    tweetTexts = {statuses.text}'; 
end

% 日付情報の修正
tweets = timetable(tweetTexts,'RowTimes', ...
    datetime(tweetTimes, 'Format', 'eee MMM dd HH:mm:ss +SSSS yyyy', 'Locale', 'en_US'));
save tweets

ツイートの感情分析を実施

ツイートの数を取得

numTweets = height(tweets);

ポジティブ，ネガティブの単語の定義

ポジティブ，ネガティブの単語リストの読み込み - wordlist_posneg.xlsx は [日本語評価極性辞書（用言編、名詞編）¹ を Excel ファイルとして保存

filename = 'wordlist_posneg.xlsx';
posneg = import_posneg(fullfile('dict', filename));
categ_list = string( unique(posneg.category) );

% ポジティブな単語の抽出
possubidx = contains(categ_list, "ポジ");
postag = categ_list(possubidx);
posidx = contains( string(posneg.category), postag);
poskeywords = posneg(posidx, 2);
rmidx = find( poskeywords.word == "" );
poskeywords(rmidx,:) = [];

% ネガティブな単語の抽出
negtag = categ_list(~possubidx);
negidx = contains( string(posneg.category), negtag);
negkeywords = posneg(negidx, 2);
rmidx = find( negkeywords.word == "" );
negkeywords(rmidx,:) = [];

ポジティブ，ネガティブな単語リスト内の単語を含む tweet の検索

% ポジティブなツイートの抽出
for i=1:height(poskeywords)
    % 全てのツイートと一つのpos単語を比較
    temp = contains(tweets.tweetTexts,poskeywords.word(i),'IgnoreCase',true);
    dJobs_pos(:,i) = temp;
end
numPosTweets = sum(sum(dJobs_pos,2) > 0);
poswordCount = sum(dJobs_pos,1);
posidx = (poswordCount > 0);
poswordlist = poskeywords.word(posidx);
poscountlist = poswordCount(posidx);

% ネガティブなツイートの抽出
for i=1:height(negkeywords)
    % 全てのツイートと一つのneg単語を比較
    temp = contains(tweets.tweetTexts,negkeywords.word(i),'IgnoreCase',true);
    dJobs_neg(:,i) = temp;
end
numNegTweets = sum(sum(dJobs_neg,2) > 0);
negwordCount = sum(dJobs_neg,1);
negidx = (negwordCount > 0);
negwordlist = negkeywords.word(negidx);
negcountlist = negwordCount(negidx);

ポジネガのカウントと，使用されているそれぞれの単語の wordcloud 表示

tweetTable = table(numTweets,numPosTweets,numNegTweets, ...
    'VariableNames',{'Number_of_Tweets','Positive_Tweets','Negative_Tweets'});

tweetTable

ポジティブなツイートが圧倒的に多い．みんな好きよね．

figure
subplot(1,2,1)
wordcloud(poswordlist, poscountlist);
title('tweetの中でよく使われるポジティブな単語')
subplot(1,2,2)
wordcloud(negwordlist, negcountlist);
title('tweetの中でよく使われるネガティブな単語')

ルマンドはおいしくて人を嬉しくさせるようでときどき可愛いこともあるかもしれませんが，がっかりしてしまったり，刺さって痛かったりするようです．太るし，辛いし，人を困らせたりして罪深いことも．．

Inspired by:
Conduct Sentiment Analysis Using Historical Tweets

Reference:

日本語評価極性辞書（用言編、名詞編）小林のぞみ，乾健太郎，松本裕治，立石健二，福島俊一. 意見抽出のための評価表現の収集. 自然言語処理，Vol.12, No.3, pp.203-222, 2005. ↩