More than 1 year has passed since last update.

Pythonでsnscrapeを用いたツイート抽出

Last updated at 2022-08-17Posted at 2022-08-05

snscrapeは個人のAPIキーがなくても自由にSNSをスクレイピングできます。
詳しくはこちら：https://github.com/JustAnotherArchivist/snscrape

今回はsnscrapeを使って、Twitterから時間、キーワード、いいね数、ツイートの個数を指定して、スクレイピングしてみました。

開発環境

M1 Mac
python3.8
pandas

snscrapeのインストール

pip3 install snscrape

実装

# 必要なライブラリをインポート
import snscrape.modules.twitter as sntwitter
import pandas as pd

# ツイートの個数設定
maxTweets = 1000
# ツイート検索するキーワード
keyword = '新年'

df=[]
cols=pd.DataFrame([['id','date','tweet','likeCount']])
cols.to_csv('tweet.csv',index=False,header=False)

# 2022年1月1日の「新年」を含むツイート
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + ' since:2022-01-01 until:2022-01-02 lang:ja -filter:links -filter:replies').get_items()):
        # いいね数20以上、文字数20以上のツイートを取得
        if tweet.likeCount >= 20 and len(tweet.content) >= 20:
            # 改行を削除 
            t = tweet.content
            text = t.replace('\n', '')

            df.append([tweet.id, tweet.date, text, tweet.likeCount])
            df1=pd.DataFrame([[tweet.id, tweet.date, text, tweet.likeCount]])
            # csvファイルとして保存
            df1.to_csv('tweet.csv',index=False,mode='a+',header=False)
        elif len(df) == maxTweets:
            break

結果

import pandas as pd
tweet = pd.read_csv('tweet.csv')
tweet

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up