LoginSignup
4
9

More than 5 years have passed since last update.

arXiv APIのPythonでの取得

Last updated at Posted at 2017-07-28

arXiv API

なんかtwitter@astro_ph_EPは止まってしまったし,指定カテゴリだけながすbotさんとか欲しいかなー,みたいな感じでちょっとarXiv APIでも見てみましたが.

しかし公式( https://arxiv.org/help/api/user-manual )の記述そのままでは動作しないのでした…

公式の修正

エラーは以下で発生.

data = urllib.urlopen(url).read()

urllibが古い?
公式の記述なら下記のように書き換えれば動作します!

import urllib.request
url = 'http://export.arxiv.org/api/query?search_query=all:electron&start=0&max_results=1'
data = urllib.request.urlopen(url).read()

取得例

import urllib.request
import datetime as dt
import re


def main():
    basedate = dt.date.today()+dt.timedelta(days=-1)
    previousdate = basedate +dt.timedelta(days=-1)

    url_q = 'http://export.arxiv.org/api/query?search_query=submittedDate:['+previousdate.strftime('%Y%m%d')+'1400+TO+'+basedate.strftime('%Y%m%d')+'1400]+AND+(cat:astro-ph.EP)&start=0&sortBy=submittedDate&sortOrder=ascending'
    data = urllib.request.urlopen(url_q).read().decode('utf-8')
    #
    parse = lambda a,b: re.findall("<" + b + ">([\s\S]*?)<\/" + b + ">", a)
    #
    entries = parse(str(data), "entry")
    for entry in entries:
        url = parse(entry, "id")[0]
        title = parse(entry, "title")[0]
        author = ', '.join(parse(entry, "name") )
        summary = parse(entry, "summary")[0]
        print( '%s\n%s\n%s\n%s' % (url, title, author, summary) )

if __name__ == '__main__':
    main()

微妙にNew submissionsと件数が合わない…
米祝日はholidays.US()を使えば回避可能.

4
9
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
9