More than 3 years have passed since last update.

Python Tweepyを使ってキーワードを含むツイートを取得

Last updated at 2021-01-18Posted at 2021-01-18

週末ハッカソンのWEBサイト作成で #週末ハッカソンのキーワードが含むツイートを取得したい。
Pythonのtweepyモジュールを使い取得する。

Consumer API keys と Access token & access token secret を取得

Twitter Developers からアプリケーションを登録してConsumer API keys と　Access token & access token secret を取得する。

tweepy インストール

Twitter API の操作はtweepyが便利、是非使おう。

$ pip install tweepy

OAuth 認証

Twitter API を操作するには OAuth認証が必要
Developerサイトで取得したAPI KEYを使い認証を行う。

import tweepy


# 取得したAPI KEY、TOKENを入力
API_KEY = ""
API_SECRET_KEY = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""


def twitter_api() -> tweepy.API:
    auth = tweepy.OAuthHandler(API_KEY, API_SECRET_KEY)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
    return tweepy.API(auth)

キーワード検索

get_searchメソッドを作成してキーワード検索をできるようにした

引数説明

KEY	内容
api	OAuth 認証
q	検索したいキーワード
start_date	検索開始期間
end_date	検索終了期間
count	取得数

データは加工がし易いのでPandasに入れることにした
tweet_created_at はUS時間のため +9 時間して日本時間に変更している

from datetime import timedelta

import pandas as pd
import tweepy

# 取得したAPI KEY、TOKENを入力
API_KEY = ""
API_SECRET_KEY = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""


def twitter_api() -> tweepy.API:
    auth = tweepy.OAuthHandler(API_KEY, API_SECRET_KEY)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
    return tweepy.API(auth)


def get_search(
    api: tweepy.API, q: str, start_date: str, end_date: str, count: int = 1000
) -> pd.DataFrame:

    q = f"{q} since:{start_date} until:{end_date} -filter:retweets"

    tweets = api.search(
        q=q,
        count=count,
        tweet_mode="extended",
        locale="ja",
        lang="ja",
        include_entities=False,
    )

    df = pd.DataFrame(
        columns=[
            "user_id",
            "user_name",
            "user_screen_name",
            "user_profile_image_url",
            "tweet_id",
            "tweet_full_text",
            "tweet_favorite_count",
            "tweet_created_at",
        ]
    )

    for tweet in tweets:
        df = df.append(
            {
                "user_id": tweet.user.id,
                "user_name": tweet.user.name,
                "user_screen_name": tweet.user.screen_name,
                "user_profile_image_url": tweet.user.profile_image_url.replace(
                    "_normal", ""
                ),
                "tweet_id": tweet.id,
                "tweet_full_text": tweet.full_text,
                "tweet_favorite_count": tweet.favorite_count,
                "tweet_created_at": tweet.created_at + timedelta(hours=+9),
            },
            ignore_index=True,
        )
    return df

使い方

シンプルにOAuth認証して検索、DataFrameなので良しなに加工して使う

api = twitter_api()
search = get_search(api, "#週末ハッカソン", "2021-01-15", "2021-01-18")

idxmax = search.groupby("user_id").tweet_created_at.idxmax()
tweets = search.iloc[idxmax]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up