More than 1 year has passed since last update.

Python Twitter APIを使わずにツイートを収集 2023年最新版

Last updated at 2023-03-20Posted at 2023-03-20

ツイート収集用の各ライブラリ

ほぼ使えなくなったので自力で収集する方がいいです。

事前準備

Google Colaboratory でツイートを収集してみます。
Google Colaboratory でTwitterの画像をダウンロードの通りにGoogle Colaboratory に chrome, selenium をインストールしてください。

ツイート収集

ツイートを収集したいアカウントにアクセスして、下までスクロールします。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import datetime
from bs4 import BeautifulSoup

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver', options=options)
driver.implicitly_wait(10)

url = "https://twitter.com/idolpicture2/"
driver.get(url)

# articleタグが読み込まれるまで待機（最大15秒）
WebDriverWait(driver, 15).until(EC.visibility_of_element_located((By.TAG_NAME, 'article')))
# 下までスクロール
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

１ページ以上収集したい場合は、上の処理をループで何度も実行してください。
ツイートは article 要素に格納されています。

# ツイートの article を参照
elems_articles = driver.find_elements(By.TAG_NAME, 'article')

# 各ツイートをパース
for elem_article in elems_articles:
  try:
    # ツイート日時
    time_elem = elem_article.find_element(By.TAG_NAME, 'time')
    time_str = time_elem.get_attribute('datetime')
    print(time_str)

    # ツイートURL
    tweet_url = time_elem.find_element(By.XPATH, "..").get_attribute('href')
    print(tweet_url)

    # 添付画像
    imgs = elem_article.find_elements(By.TAG_NAME, 'img')
    for img in imgs:
        image_src = img.get_attribute('src')
        print(image_src)

    # ツイート本文 
    html = elem_article.get_attribute('innerHTML')
    soup = BeautifulSoup(html, features='lxml')
    results = soup.findAll("div", {"data-testid" : "tweetText"})
    tweet_text = results[0].get_text()
    print(tweet_text)

  except Exception as e:
    print (e)

Google Colaboratory ではスクロールを繰り返す処理ができないことがあります。
その時はローカル端末で実行してください。

動画のダウンロード

ツイートから動画をダウンロードする場合は、上で得られたツイートのURLをここにPOSTする方が簡単です。
https://lab.syncer.jp/Tool/Twitter-Video-URL-Converter/

Selenium で実行し、ダウンロードできます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up