More than 3 years have passed since last update.

【Python】大学のレポート期限をGoogle Calendarに自動追加する。

Last updated at 2021-10-31Posted at 2021-10-23

#きっかけ
現在大学3年生で、私の通う大学では、新型コロナウイルスの影響でほとんどがオンライン講義になりました。そして、レポート等の課題は大学HPで不定期に掲載される科目が多く、こまめに確認しなければ気づけないという問題がありました。私自身、掲載されたことに気づかず、友人伝いでレポートに気づくこともありました。
そこで私が友人に頼らなくても、早く課題に気づいて取り掛かり、Google Calendarで管理することで提出忘れがないようにするために作ったのが、このアプリです。
ホームページでmanabaを採用されている大学の学生はURL等を少し変えるだけで使えるようになるかもしれません。是非試してみてください。
完成したものを「最後に」に載せているので、良かったら初めに見てみてください。イメージがつかめると思います。

#実装した機能

大学のwebページから科目名とレポート期日をスクレイピング
Google Calendarから現在登録されている予定を取得
1の中から未登録のレポート期日をGoogle Calendarに登録
新規登録があった時にはLineで通知

#事前準備
コード以外にインストール・登録が必要なものと参考にした記事を先に記載します。

chrome driverのインストール
- 参考記事：初心者でも簡単にできるSeleniumのインストール【Python】
Google API
- 参考記事：無次元日記
- Google cloud platform：https://console.cloud.google.com/?hl=ja
Heroku
- 参考記事
  - https://devcenter.heroku.com/ja/articles/git (おすすめ)
  - https://qiita.com/1-row/items/80f89c8ada2e61f04446
- Heroku関連のエラー解決に役立った記事
  - https://qiita.com/mizoe@github/items/0f7898fe026fa4cefe9d
  - https://qiita.com/taku_hito/items/52c6c52385386544aa62

#スクレイピングのコード

get_report.py

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
from selenium.webdriver.chrome.options import Options
from datetime import datetime as dt

options = Options()
options.add_argument('--headless')

まず一番初めにChrome Driverが保存されているPATHとログインするためのUSERNAME、PASSWORDを設定します。

get_report.py

driver = webdriver.Chrome('PATHを入力する。')
USERNAME = 'ログインID(メールアドレス等)を入力'
PASSWORD = 'パスワードを入力'

必要なライブラリと情報の入力を済ませたらさっそくスクレイピングのコードを書き始めます。

大学HPへログインする関数。

get_report.py

def get_report_info():
    # manabaのログインページへアクセス
    URL = 'アクセスしたいサイトのURLを入力'
    driver.get(URL)
    driver.implicitly_wait(10)

    # ログインページへ移動
    login_page = driver.find_element(By.XPATH, '/html/body/div/div/section/div/div[1]/div/p[2]/button')
    login_page.click()
    driver.implicitly_wait(10)

    # ログインページにてメールアドレスを入力・「次へ進む」をクリック
    sleep(5)
    mail_field = driver.find_element(By.XPATH, '//*[@id="i0116"]')
    mail_field.send_keys(USERNAME)
    next_botton = driver.find_element(By.XPATH, '//*[@id="idSIButton9"]')
    next_botton.click()
    driver.implicitly_wait(10)

    # パスワードを入力・「サインイン」をクリック
    sleep(1)
    wait.until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="lightbox"]')))
    password_field = driver.find_element(By.XPATH, '//*[@id="i0118"]')
    password_field.send_keys(PASSWORD)
    driver.implicitly_wait(10)
    wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="idSIButton9"]')))
    signin_botton = driver.find_element(By.XPATH, '//*[@id="idSIButton9"]')
    signin_botton.click()
    driver.implicitly_wait(10)

    # 「いいえ」をクリック
    sleep(1)
    driver.implicitly_wait(10)
    refuse_botton = driver.find_element(By.XPATH, '//*[@id="idBtn_Back"]')
    refuse_botton.click()

sleep()としているのは、サイトの表示が完成する前に、find_elementで要素を探そうとすると、要素を発見できないというエラーを防ぐためです。
**implicitly_wait()**や、wait.until(EC.element_to_be_clickableでそのエラーが防げるという記事もあり一応入れているのですが、それで変わらなかったのでsleep()で対応しています。作業の速度のことを考えるとあまり得策ではないかもしれません。**implicitly_wait()**と、wait.until(EC.element_to_be_clickableは記述しなくてもアプリは動きます。

各授業ページからレポート一覧のページへ遷移する関数。

get_report.py

    def access_to_report_page():
        wait.until(EC.visibility_of_all_elements_located((By.XPATH, '/html')))
        report_button = driver.find_element(By.XPATH, '//*[@id="coursereport"]/img')
        report_button.click()

get_report.py

    course_taken = driver.find_elements(By.CLASS_NAME, 'stdlist')
    course_taken2 = course_taken[0].find_elements(By.CLASS_NAME, 'course.course-cell')
    print("あなたは%sコマ履修しています。" %(len(course_taken2)))

    # 履修科目の時間割のURLを取得
    URLs = []
    for i in course_taken2:
        a_tag = i.find_elements(By.TAG_NAME, 'a')
        URL = a_tag[0].get_attribute('href')
        URLs.append(URL)

    report_info = []
    for i in URLs:
        # 履修登録科目へ遷移
        driver.get(i)
        sleep(1)
        access_to_report_page()
        # 科目名を取得
        class_name = driver.find_element(By.ID, 'coursename').text
        # 未提出レポートがある科目の科目名と期限を取得し、report_info[]に追加
        tr_tags = driver.find_elements(By.TAG_NAME, 'tr')
        for tr_tag in tr_tags:
            tr_text = tr_tag.text
            if tr_text != '' and '未提出' in tr_text:
                tr_text_list = tr_text.split(' ')
                try:
                    # Google Calendar APIに合うようにフォーマットを修正
                    deadline = tr_text_list[-2] + ' ' + tr_text_list[-1]
                    deadline = dt.strptime(deadline, '%Y-%m-%d %H:%M')
                    deadline = dt.strftime(deadline, "%Y-%m-%dT%H:%M:00")
                    report_info.append((class_name,deadline))
                except:
                    pass
    # 科目名と期限のリストを返す。[(科目名,期限), (科目名,期限), ・・・,(科目名,期限)]
    return report_info

**course_taken2**に、時間割のセルの中から履修科目の情報を入れ、次のfor文で**course_taken2**の中からURLを取り出して**URLs**に入れます。そして次のfor文で、各履修科目のURLへアクセスして、未提出レポートがある科目の科目名と期限を取得し**report_info**に追加する。追加する際に期限のフォーマットをGoogle Calendar APIに合うように修正する。最後に**report_info**を返してこのファイルは終了。

スクレイピングの全体のコード。

get_report.py

from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
from selenium.webdriver.chrome.options import Options
from datetime import datetime as dt

options = Options()
options.add_argument('--headless')


driver = webdriver.Chrome('PATHを入力する。')
USERNAME = 'ログインID(メールアドレス等)を入力'
PASSWORD = 'パスワードを入力'
wait = WebDriverWait(driver,10)

def get_report_info():
    # manabaのログインページへアクセス
    URL = 'アクセスしたいサイトのURLを入力'
    driver.get(URL)
    driver.implicitly_wait(10)

    # ログインページへ移動
    login_page = driver.find_element(By.XPATH, '/html/body/div/div/section/div/div[1]/div/p[2]/button')
    login_page.click()
    driver.implicitly_wait(10)

    # ログインページにてメールアドレスを入力・「次へ進む」をクリック
    sleep(5)
    mail_field = driver.find_element(By.XPATH, '//*[@id="i0116"]')
    mail_field.send_keys(USERNAME)
    next_botton = driver.find_element(By.XPATH, '//*[@id="idSIButton9"]')
    next_botton.click()
    driver.implicitly_wait(10)

    # パスワードを入力・「サインイン」をクリック
    sleep(1)
    wait.until(EC.visibility_of_all_elements_located((By.XPATH, '//*[@id="lightbox"]')))
    password_field = driver.find_element(By.XPATH, '//*[@id="i0118"]')
    password_field.send_keys(PASSWORD)
    driver.implicitly_wait(10)
    wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="idSIButton9"]')))
    signin_botton = driver.find_element(By.XPATH, '//*[@id="idSIButton9"]')
    signin_botton.click()
    driver.implicitly_wait(10)

    # 「いいえ」をクリック
    sleep(1)
    driver.implicitly_wait(10)
    refuse_botton = driver.find_element(By.XPATH, '//*[@id="idBtn_Back"]')
    refuse_botton.click()


    def access_to_report_page():
        wait.until(EC.visibility_of_all_elements_located((By.XPATH, '/html')))
        report_button = driver.find_element(By.XPATH, '//*[@id="coursereport"]/img')
        report_button.click()


    sleep(1)
    print("**************************************************")
    course_taken = driver.find_elements(By.CLASS_NAME, 'stdlist')
    course_taken2 = course_taken[0].find_elements(By.CLASS_NAME, 'course.course-cell')
    print("あなたは%sコマ履修しています。" %(len(course_taken2)))

    # 履修科目の時間割のURLを取得
    URLs = []
    for i in course_taken2:
        a_tag = i.find_elements(By.TAG_NAME, 'a')
        URL = a_tag[0].get_attribute('href')
        URLs.append(URL)

    # 未提出レポートのある科目名とその期限を取得
    report_info = []
    for i in URLs:
        # 履修登録科目へ遷移
        driver.get(i)
        sleep(1)
        access_to_report_page()
        # 科目名を取得
        class_name = driver.find_element(By.ID, 'coursename').text
        # 未提出レポートがある科目の科目名と期限を取得し、report_info[]に追加
        tr_tags = driver.find_elements(By.TAG_NAME, 'tr')
        for tr_tag in tr_tags:
            tr_text = tr_tag.text
            if tr_text != '' and '未提出' in tr_text:
                tr_text_list = tr_text.split(' ')
                try:
                    # Google Calendar APIに合うようにフォーマットを修正
                    deadline = tr_text_list[-2] + ' ' + tr_text_list[-1]
                    deadline = dt.strptime(deadline, '%Y-%m-%d %H:%M')
                    deadline = dt.strftime(deadline, "%Y-%m-%dT%H:%M:00")
                    report_info.append((class_name,deadline))
                except:
                    pass
    # 科目名と期限のリストを返す。[(科目名,期限), (科目名,期限), ・・・(科目名,期限)]
    return report_info

#Google Calendarにある予定を取得するコード

get_event.py

from __future__ import print_function
import datetime
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/calendar']

def get_event():
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('calendar', 'v3', credentials=creds)

    # Call the Calendar API
    timefrom = '2021/9/01'
    timeto = '2022/01/31'
    timefrom = datetime.datetime.strptime(timefrom, '%Y/%m/%d').isoformat()+'Z'
    timeto = datetime.datetime.strptime(timeto, '%Y/%m/%d').isoformat()+'Z'
    events_result = service.events().list(calendarId='b9r11155@bunkyo.ac.jp',
                                        timeMin=timefrom,
                                        timeMax=timeto,
                                        singleEvents=True,
                                        orderBy='startTime').execute()
    events = events_result.get('items', [])

    events_start = []
    for event in events:

        start = event['start'].get('dateTime', event['start'].get('date'))
        #event = datetime.datetime.strptime(event, '%Y-%m-%dT%H:%M:%S:%Z')
        #start = datetime.datetime.strptime(start[:-6], '%Y-%m-%dT%H:%M:%S')
        events_start.append(start[:-6])
    

    if not events:
        return('No upcoming events found.')
    else:
        return events_start

#Google Calendarにレポート期限を追加するコード

insert.py

from __future__ import print_function
import datetime
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

def authorize():
    # If modifying these scopes, delete the file token.pickle.
    SCOPES = ['https://www.googleapis.com/auth/calendar']


    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)
        
    service = build('calendar', 'v3', credentials=creds)
    return service

def set_event(report_name, report_deadline, service):
    event = {
        'summary': report_name,
        'description': 'レポート期限',
        'start': {
        'dateTime': report_deadline,
        'timeZone': 'Japan',
        },
        'end': {
        'dateTime': report_deadline,
        'timeZone': 'Japan',
        },
        'colorId':3
    }

    event = service.events().insert(calendarId='自分のgoogle calendarのIDを入力',
                                    body=event).execute()
    print (event['id'])

上の2つコードは最初に記載した、無次元日記という記事をもとに作成しました。
この記事は本当にわかりやすくまとまっていたので、是非そちらも参考に進めてみてください。

#メインコード

app.py

import insert
import get_event
import get_report
from datetime import datetime as dt
import notice


report_list = get_report.get_report_info()

# GoogleCalendarAPIを使うためのOAuth認証
service = insert.authorize()

events_start = get_event.get_event()

# GoogleCalendar に予定を送る
result = []
for class_name, deadline in report_list:
    # 重複して予定を登録しないため
    if deadline not in events_start:
        insert.set_event(class_name, deadline, service)
        result.append((class_name, deadline))

if result == []:
    print('新しく追加されたレポートはありません')
else:
    print('---新しく追加されたレポート---')
    for class_name, deadline in result:
        print('%s : %s' % (class_name, deadline))

ここまでの内容で、ローカル環境においてはGoogle Calendarにレポートの期限を追加するというアプリは完成しました。しかし、このままではレポートの期限を確認するために、定期的に自分でアプリを実行させる必要があります。これは私が最初に考えていたゴールとは違うので、herokuへデプロイしてheroku schedulerを使ってこのアプリを定期的に実行させます。

　
#Herokuへのデプロイと定期実行の設定
※定期実行を行うためのHeroku Schedulerを使用するためにはクレジットカードが必要になります。(料金はかかりません)

Herokuへのデプロイは最初に記載した公式サイトの記事で詳しく説明されているのでここでは、コードと簡単な説明のみとさせていただきます。

heroku login
heroku create

まずHerokuへログインして、アプリケーションを登録します。
次に、これまで書いてきたアプリに関するファイルに加えて、runtime.txt、requirements.txt、Procfileを用意しておきます。これらのファイルを保存する際に、文字コードがUTF-8であることを確認して保存してください。

git init
git add .
git commit -m "first commit"
git push heroku master

そして、全てのファイルを追加してプッシュすることでherokuへのデプロイは終わりです。

私は、Heroku環境でSeleniumやChrome Driverを使うことができなかったので「Heroku+Python+seleniumでスクレイピング環境を構築」の記事を参考に、Heroku環境でSeleniumとChrome Driverをインストールして使うことで上手く実行できました。

最後に、定期実行するためのHeroku Schedulerの設定方法ですが、私が設定するときに読んだ記事を載せさせていただきます。「Heroku Schdulerの設定方法」

　　
#Line通知機能の追加　
レポート期限が自動的に追加するアプリはこれまでで完成させられました。ここで終わろうかとも思ったのですが、追加された時にそれを教えてくれる方がより使い勝手が良いと思いこの機能を追加しました。

notice.py

import requests

def send_line_notify(notification_message):
    """
    LINEに通知する
    """
    try:
        line_notify_token = 'oVcZrQrekz5f4g52kwZaoOVcC9IzrCgDdWIlG91llvo'
        line_notify_api = 'https://notify-api.line.me/api/notify'
        headers = {'Authorization': f'Bearer {line_notify_token}'}
        data = {'message': notification_message}
        requests.post(line_notify_api, headers = headers, data = data)

    except:
        print('失敗')

「pythonでLINENotify使ってみる」の記事を参考にコードを書きました。気になる方は読んでみてください。

app.py

import notice

そしてapp.pyの最初に通知用ファイルをインポートし、最後の部分を下のように書き換えたらすべて完了です。

app.py

if result == []:
    print('新しく追加されたレポートはありません')
else:
    print('---新しく追加されたレポート---')
    for class_name, deadline in result:
        print('%s : %s' % (class_name, deadline))
        notice.send_line_notify('%s:%s' % (class_name, deadline))

#最後に

実際にスマホではこのような感じで動いています。
1枚目の紫色の予定が大学のレポートの期限です。そして2枚目できちんと科目と期限が通知されています。
このアプリを作ってからはいちいち大学のHPへ行ってレポートの期限を確認しなくてよくなったので、その手間がなくなり、レポートが課された後30以内に通知が来るので早く取り掛かり始められるようになりました。そして3枚目の画像のようにIphoneのホーム画面から期限前日・当日には表示されるので提出忘れをしにくくなったと思います。
当初の「早く課題に気づいて取り掛かり、Google Calendarで管理することで提出忘れがないようにする」という課題の解決になるアプリケーションを開発することができ、本当に嬉しかったです。
もしも、同じような悩みを抱えている方がいらっしゃいましたら是非参考にしていただければ幸いです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up