2023年のQiita人気タグをアニメーションで振り返る

Last updated at 2024-01-09Posted at 2024-01-09

結果から

はじめに

つい先週に、ワードクラウドの時系列変化をアニメーション化するAnimatedWordCloudというライブラリが出ました

こちらも読むべき: 【Python OSSライブラリ製作】話題の時系列変化を可視化する"アニメ化"ワードクラウドを作成するライブラリをリリースした

ミーハーの王、ミーハーキングなので、さっそく使ってみることにしましょう。

実装

コードだけ見たい方はこちらのレポジトリから、Google Colabで実行することができます

データ取得

まずは、Qiita APIからデータを取得します。

トークンを発行します。
(参考: Qiita API アクセストークン発行方法)

検索クエリにより記事一覧を得るとき、 ページ といって、100件ごとにアクセスする必要があります。

データ取得（ページ取得部分）.py

import requests

url = "https://qiita.com/api/v2/items"

#トークンによる認証
header = {
  'Authorization': f'Bearer {token}'
}

#1ページ取得
def fetch_page(month, page):
    tags = []

    # クエリ作成
    if month < 12:
      query = f"stocks:>10 created:>=2023-{str(month).zfill(2)} created:<2023-{str(month+1).zfill(2)}"
    else:
      query = f"stocks:>10 created:>=2023-12 created:<2024-01"
    params = {
      "page": page,
      "per_page": 100,
      "query": query
    }

    #GET通信
    articles = requests.get(url, headers=header, params=params).json()

    #見つかったタグを一つのリストで返す
    for article in articles:
        for _tag in article["tags"]:
          tags.append(_tag["name"])

    return tags

これをページが無くなるまで繰り返させます。

#1か月のデータ取得
def fetch_month(month):
  #単語ベクトル
  wordvector = {}

  for page in range(1, 101):
    tags = fetch_page(month, page, header)

    #もしタグが見つからくなったら...
    if len(tags) == 0:
        #...最後のページまでいったので、停止
        break

データ用意

AnimatedWordCloudに渡すには、
単語 -> 重み値
のdictionary型にする必要があります。

#(fetch_month(month))の続き
    #単語ベクトルに加算
    for tag in tags:
      wordvector[tag] = wordvector.get(tag, 0) + 1

  return wordvector

なお、APIから返ってくるデータがかなり重く、1ページあたりの時間がけっこうかかるので、 並列化 を推奨します。

AnimatedWordCloudに渡す時系列データの最終形は
タプル(時間名, 単語辞書)のリスト
である必要があります。

timelapse = []
for month in tqdm(range(1,12+1)):
    timelapse.append(
        #タプルに
        (
            str(month)+"月",   #時間名
            fetch_month(month) #単語辞書
        )
    )

可視化

日本語データを扱うには、 まず日本語フォントを取り寄せる必要があります。

お行儀が良いかあんまり分かりませんが、japaneze-matplotlibのGitHubレポジトリから直接ダウンロードします

ttf_raw = requests.get("https://github.com/uehara1414/japanize-matplotlib/raw/master/japanize_matplotlib/fonts/ipaexg.ttf").content

with open("/content/jap.ttf", "wb") as f:
  f.write(ttf_raw)

AnimatedWordCloudをインストールして、、、

pip install AnimatedWordCloudTimelapse

そして、可視化します。

from AnimatedWordCloud import Config, animate

config = Config(
    max_words=40,
    font_path="/content/jap.ttf",  #日本語フォントを指定
    output_path="/content/",
    min_font_size=20  #一番重みが低い単語のフォントサイズ
)

animate(timelapse, config)

Jupyter NotebookないしGoogle Colabなら、下記で表示できます

from IPython.display import display, Image

with open('/content/output.gif','rb') as f:
    display(Image(data=f.read(), format='png'))

結果

その結果がこちら

Pythonが中心になりつつ、ChatGPTがかなり強いですね。
Reactもそこそこ強い。（Vue使え！！！）

上旬人気だったAzureOpenAIServiceなどが下旬にいなくなったりと、話題性も確認できます。

おまけ

年代別データです

ポエムって割と最近に増えたタグなんですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up