Mediumの最新記事をGoogle Spreadsheetに更新しよう

Last updated at 2019-06-02Posted at 2019-06-01

はじめに

はじめまして。古橋研究室2期生の武末です。
今年からギットハ部というゼミ活としてQiitaでも活動を開始しました。
勉強内容や作ったコードなどを共有していければいいなと思います。

今回作ったのは、表題通り「mediumの新着記事URLをGoogle Spreadsheetに記入する」スクリプトです。うちのゼミの4年生は毎週一回、ゼミの進捗をブログにて報告することが義務づけられています。~~おいおい死ぬわこれ~~

Mediumでブログを投稿した後、そのURLをSpreadsheetに書き込むのですが、この作業忘れがちだし、一々投稿するたびにsheetを編集するのはスマートじゃないですよね。

そんな時、

MediumにはRSS Feedがあること
Google SpreadsheetはApiによる読み書きができること

を知ったのでせっかくだし作ってみるかーと考えたのが今回の記事です。

作ったもの

コード

reportmanager.py

import gspread
from oauth2client.service_account import ServiceAccountCredentials
from bs4 import BeautifulSoup
import requests

scope = ['https://spreadsheets.google.com/feeds',
         'https://www.googleapis.com/auth/drive']

credentials = ServiceAccountCredentials.from_json_keyfile_name('Xxxxxxxx.json', scope)
gc = gspread.authorize(credentials)
wks = gc.open_by_key('sheetID').get_worksheet(3)

url = 'https://medium.com/feed/@Xxxxxxxxx'
headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0",
        }
req = requests.get(url=url, headers=headers)

soup = BeautifulSoup(req.content, "xml")

def alpha2num(alpha):
    num=0
    for index, item in enumerate(list(alpha)):
        num += pow(26,len(alpha)-index-1)*(ord(item)-ord('A')+1)
    return num

def num2alpha(num):
    if num<=26:
        return chr(64+num)
    elif num%26==0:
        return num2alpha(num//26-1)+chr(90)
    else:
        return num2alpha(num//26)+chr(64+num%26)

cell = alpha2num("AA")

check_data = wks.acell( num2alpha(cell) + "14").value

while "http" in check_data:
    cell = cell + 1
    check_data = wks.acell( num2alpha(cell) + "14").value
else:
    wks.update_acell( num2alpha(cell) + "14", str(soup.item.link.string))

print(num2alpha(cell) + "14" + "is compleate.")

GitHub

解説もどき

まず、こいつを動かすためにはGoogleOAuthの登録とトークンキーが必要です。
Qiitaにある情報はちと古いものが多いので、私はここを参考にしました。

MediumのRSS Feedについてはここ

ZからAAセルへの計算関数は、こちらのコードをお借りしました。

肝心のRSS Feedはこんな感じになってて、

//medium.com/feed/@koukitakesue.xml

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html" version="2.0">
<channel>
<title>
<![CDATA[ Stories by Kouki Takesue on Medium ]]>
</title>
<description>
<![CDATA[ Stories by Kouki Takesue on Medium ]]>
</description>
<link>
https://medium.com/@koukitakesue?source=rss-8b0a2cb3c3d------2
</link>
------------
<item>
<title>
<![CDATA[ Maprayを使ってみよう！（後編） ]]>
</title>
<link>
https://medium.com/furuhashilab/mapray%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E3%81%BF%E3%82%88%E3%81%86-%E5%BE%8C%E7%B7%A8-ed2a76371b1b?source=rss-8b0a2cb3c3d------2
</link>

この、item.linkを取得すれば最新記事が一番上に来ているはずなので取得できる。

ちなみにheadersを指定しているのは、単にrequestしても403が返ってくるため。
別に攻撃とかじゃないからUA偽造してもいいよね・・・？

問題点

Publishを見分けてないので自分投稿が混じると対応できない
~~これJSで書いた方が早かったんじゃないの？~~

以上の問題点は今後修正していく予定です。
変更点、改善点等ありましたらご連絡いただけると幸いです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up