More than 5 years have passed since last update.

【備忘録】①tweet取得・保存 ~拡散されるニュースツイートを判別したい~

Last updated at 2019-12-17Posted at 2019-12-15

開発環境

Windows10
Anaconda3 ( jupyter notebook )

説明と目的

ある大学生の卒論備忘録
テーマは、ニュースツイートにおいて、拡散されるものと拡散されないものの判別器を作るというものです。
今回は、その中でのTweet取得について書いています。

前提条件

・Tweet Developer認証済み
・tweepyインストール済み

参考

ツイート取得

取得するアカウントは@livedoornewsです。
理由としては、フォロワー数とそのフォロワーの感度のよさ(RTをよくするか)で秀でていたためです。

get_newstweet.ipynb

# 必要なライブラリをインポート
import tweepy
import pandas as pd

get_newstweet.ipynb


# Twitter APIを使用するためのConsumerキー、アクセストークン設定
Consumer_key = "API key"
Consumer_secret = "API secret Key"
Access_token = "Access token"
Access_secret = "Access token secret"

# 認証
auth = tweepy.OAuthHandler(Consumer_key,Consumer_secret)
auth.set_access_token(Access_token, Access_secret)
api = tweepy.API(auth)

get_newstweet.ipynb

# アカウント名指定
acount = "@livedoornews"
"""
取得内容　ツイートナンバー，時間，ツイートテキスト，いいね数，RT数
"""
def get_tweets(acount):
    tweet_data = [] # 取得するデータを格納するための空リスト
    for tweet in tweepy.Cursor(api.user_timeline,screen_name = acount,exclude_replies = True).items():
        tweet_data.append([tweet.id,tweet.created_at,tweet.text.replace('\n',''),tweet.favorite_count,tweet.retweet_count])
        df = pd.DataFrame(tweet_data,columns=['tweet_no', 'time', 'text', 'favorite_count', 'RT_count']) #pandasのDataFrameに格納
    return df

df = get_tweets(acount)

取得したツイートを保存する (csv)

上の関数で継続してツイートを取り続けたい場合、追加保存をする必要があります。そのため、保存方法を新規保存用と追加保存用の二つ作りました。

一つ目、新規保存

get_newstweet.ipynb

# 新規保存
file_name = "../data/tweet_{}.csv".format(acount)
df.to_csv(file_name, index=False) # indexは多くの場合必要ない

二つ目、上書き保存

get_newstweet.ipynb

# 上書き保存
file_name = "../data/tweet_{}.csv".format(acount)
pre_df = pd.read_csv(file_name) # 前のcsvを読み込む
df = pd.concat([df, pre_df])
df = df.drop_duplicates(subset=['tweet_no']) # ツイートNoで重複したもの削除(新しいデータの方を残す)
df.to_csv(file_name, index=False)

まとめと次回の内容

以上、ツイートの取得とその保存でした。
新規保存や上書き保存はもっとより良い方法があるかと思います。
次回は、RTの削除とURLの除去を行いたいと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up