More than 3 years have passed since last update.

Python はてなブックマークをキーワード検索

Last updated at 2021-02-11Posted at 2021-02-11

Python はてなブックマークをキーワード検索

はてブのエントリーはRSSを使って取得する
公式 ... http://developer.hatena.ne.jp/ja/documents/bookmark/misc/feed

RSS

パラメーター
q ... キーワード
sort ... ソート（新着順：recent, 人気順：popular）
threshold ... 最低はてブ数
date_begin ... 開始日 {YYYY-MM-DD}
date_end ... 終了日{YYYY-MM-DD}

キーワード
https://b.hatena.ne.jp/search/text?q=PHP&mode=rss&date_begin=2021-02-01&date_end=2021-02-01

タグ
https://b.hatena.ne.jp/search/tag?q=PHP&mode=rss&date_begin=2021-02-01&date_end=2021-02-01

タイトル
https://b.hatena.ne.jp/search/title?q=PHP&mode=rss&date_begin=2021-02-01&date_end=2021-02-01

Python側

RSS取得は feedparser を使う
feedparser ... https://github.com/kurtmckee/feedparser

インストール

$ pip install feedparser

タイプ、キーワード、開始・終了日付で検索できるようにした
feedparser をRSS取得しDataFrameに保存

search = hatena.get_search("tag", "PHP", "2021-02-01", "2021-02-11")

import feedparser
import pandas as pd


def get_search(type: str, q: str, start_date: str, end_date: str):

    df = pd.DataFrame(
        columns=[
            "id",
            "title",
            "link",
            "summary",
            "updated",
            "hatena_bookmarkcount",
            "hatena_bookmarkcommentlistpageurl",
            "hatena_imageurl",
        ],
    )

    rss = f"https://b.hatena.ne.jp/search/{type}?q={q}&mode=rss&date_begin={start_date}&date_end={end_date}"
    d = feedparser.parse(rss)
    for entry in d.entries:

        df = df.append(
            {
                "id": entry.id,
                "title": entry.title,
                "link": entry.link,
                "summary": entry.summary,
                "updated": entry.updated,
                "hatena_bookmarkcount": entry.hatena_bookmarkcount,
                "hatena_bookmarkcommentlistpageurl": entry.hatena_bookmarkcommentlistpageurl,
                "hatena_imageurl": entry.hatena_imageurl,
            },
            ignore_index=True,
        )

    return df

実行結果

                                                  id  ...                                    hatena_imageurl
0  https://www.1st-net.jp/blog/2021/02/04/php_mai...  ...  https://www.1st-net.jp/blog/wp-content/uploads...
1  https://www.datadoghq.com/blog/engineering/php...  ...  https://imgix.datadoghq.com/img/blog/engineeri...
2  https://qiita.com/tajima_taso/items/18a2c593a3...  ...  https://qiita-user-contents.imgix.net/https%3A...

はてブ投稿を取得できました
いいね！と思ったら LGTM お願いします

【PR】週末ハッカソンというイベントやってます！ → https://weekend-hackathon.toyscreation.jp/about/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up