More than 1 year has passed since last update.

AtCoder Calendar

Last updated at 2023-11-14Posted at 2023-11-09

AtCoder Calendarについて

AtCoderの予定をGoogle Calendarにまとめて追加できるアプリケーションを作りました。今回はAtCoderのAPIが公開されていなかったため、対象のURL(https://atcoder.jp/contests/)
からスクレイピングをしてデータを取得しています。
フロントエンドのコードはGitHub（https://github.com/Akito08/atcoder-calendar）
にのせているので合わせてご確認ください。

アプリケーションの使用例

・Googleアカウントを使って、アプリにログインします。
・ユーザーの方はコンテストの開催年と月をベースにコンテストを検索します。
・「Google Calendarに予定を追加」というボタンを押すと、チェックボックスにチェックの入った予定がユーザーの方のGoogle Calendarに追加されます。

アーキテクチャについて

アーキテクチャとしては EventBridge, Lambda, DynamoDB, API Gateway、フロントエンドでJavascript、ライブラリでreactを使っています。

DBについて

データベースはDynamoDBを使いました。スクレイピングをしてきた予定を下記のように保存します。

カラム名	データ型	説明
contest_year_month	String	コンテストの開催年と月
contest_start_time	String	コンテストの開始日時をISO8601形式で保存
contest_category	String	コンテストの分類(ABC、ARCなど）
contest_end_time	String	コンテストの終了日時をISO8601形式で保存
contest_name	String	コンテストの名前

使用例)

Webスクレイピングについて

Web スクレイピングツール（ライブラリ）として Beautiful Soup を使うため、Lambda ランタイムは Python を選びました。以下はスクレイピングをし、取得したデータをDynamoDBに保存するコードです。

import os
import json
import boto3
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
import requests

url = "https://atcoder.jp/contests/"
dynamodb = boto3.resource("dynamodb")
table_name = os.environ["TABLE_NAME"]
table = dynamodb.Table(table_name)

CONTEST_TYPES = {
    "AtCoder Beginner Contest": "ABC",
    "AtCoder Regular Contest": "ARC",
    "AtCoder Grand Contest": "AGC",
    "AtCoder Heuristic Contest": "AHC"
}

def add_contest_category(contest_name):
    for contest_type, category in CONTEST_TYPES.items():
        if contest_type in contest_name:
            return category
    return ""
        
def calculate_contest_end_time(start_time, duration):
    hours, minutes = map(int, duration.split(":"))
    end_time = start_time + timedelta(hours=hours, minutes=minutes)
    return end_time
    
def lambda_handler(event, context):
    scan_response = table.scan(ProjectionExpression="contest_name")
    existed_contest_names = set(item["contest_name"] for item in scan_response["Items"])

    try:
        response = requests.get(url)
        response.raise_for_status()
    except requests.RequestException as e:
        return {
            "statusCode": 500,
            "body": json.dumps(f"Failed to retrieve data: {str(e)}")
        }

    soup = BeautifulSoup(response.text, "html.parser")
    contests = soup.find_all("tr")
    
    for i in range(4, 30):
        contest = [x for x in contests[i].stripped_strings]
        
        if len(contest) == 5:
            break
        
        contest_name = contest[3]
        if contest_name in existed_contest_names:
            continue
    
        contest_year_month = contest[0][:7]
        contest_category = add_contest_category(contest_name)
        contest_start_time = datetime.fromisoformat(contest[0][:10] + "T" + contest[0][11:-2] + ":00")
        duration = contest[4]
        contest_end_time = calculate_contest_end_time(contest_start_time, duration)

        item = {
            "contest_year_month": contest_year_month, 
            "contest_start_time": contest_start_time.isoformat(),
            "contest_end_time" : contest_end_time.isoformat(),
            "contest_name": contest_name,
            "contest_category": contest_category
        }

        try:
            table.put_item(Item=item)
            print(f"Saved: {contest_name} - {contest_start_time}")
        except Exception as e:
            print(f"Error saving {contest_name}: {str(e)}")

    return {
        "statusCode": 200,
        "body": json.dumps("Data scraped and saved to DynamoDB")
    }

EventBridge

定期的に、Webスクレイピング = 上記のLambda関数を実行するために EventBridgeを使用しています。
EventBridgeの設定で、毎日午前1時に上記の関数が実行されます。

コンテストの予定をClientに返す関数

import json
import boto3
import os
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource("dynamodb")
table_name = os.environ["TABLE_NAME"]
table = dynamodb.Table(table_name)

def lambda_handler(event, context):
    try:
        contest_year_month = event["queryStringParameters"]["contest_year_month"]
    except KeyError:
        return {
            "statusCode": 400,
            "body": json.dumps({"message": "contest_year_month parameter is missing"}),
            "headers": {
                "Access-Control-Allow-Origin": "*",
                "Content-Type": "application/json"
            },
        }

    try:
        response = table.query(
            KeyConditionExpression=Key("contest_year_month").eq(contest_year_month)
        )
    except Exception as e:
        return {
            "statusCode": 500,
            "body": json.dumps({"message": str(e)}),
            "headers": {
                "Access-Control-Allow-Origin": "*",
                "Content-Type": "application/json"
            },
        }

    items = response["Items"]
    return {
        "statusCode": 200,
        "body": json.dumps(items, ensure_ascii=False),
        "headers": {
            "Access-Control-Allow-Origin": "*",
            "Content-Type": "application/json"
        },
    }

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up