More than 1 year has passed since last update.

英辞郎 on the Web Proの単語帳をエクスポートするスクリプト

Python

Last updated at 2024-06-02Posted at 2023-02-22

英辞郎 on the Web Proの単語帳をエクスポートしてAnkiにインポートしたかったものの、ブラウザからは単語30個までしか同時にエクスポートできなかったため、Pythonスクリプトを使用してエクスポートします。
※2024/5/24時点で動作確認済み。英辞郎のWebページレイアウトが変わった場合動作しなくなるためご注意ください。

準備

ライブラリをインストールします。

pip install requests beautifulsoup4 lxml

コード

実行するとCSVファイル(eow_flashcards_YYYYMMDD_hhmmss.csv)に、1列目を見出し語、2列目を定義(HTML形式)として保存します。
英辞郎のユーザー名(メールアドレス)とパスワードを聞かれますので入力します。

#!/usr/bin/env python
# coding: utf-8

import time
from datetime import datetime
import sys
import csv
import getpass

import requests
from bs4 import BeautifulSoup
import lxml

username = input("Username: ")
password = getpass.getpass("Password: ")
login_url = "https://eowpf.alc.co.jp/login"
now = datetime.now()
output = f"eow_flashcards_{now.strftime('%Y%m%d_%H%M%S')}.csv"

# 英辞郎にログイン
print(f"Logging in to Eijiro...")
s = requests.Session()
r = s.post(
    url=login_url,
    data={
        "username": username,
        "password": password,
        "login-form-type": "pwd",
        "login": "ログイン"
    }
)

soup = BeautifulSoup(r.content, "lxml")
title = soup.find("title").text

if "ログインエラー" in title:
    sys.exit("Login failed. Check username & password.")
else:
    print("Logged in successfully.")

# 単語帳の単語登録数を確認
page = 1
flashcard_url = f"https://eowp.alc.co.jp/wordbook/ej?page={page}&col=1&sort=1"
r = s.get(url=flashcard_url)

soup = BeautifulSoup(r.content, "lxml")
fm_manage = soup.find(id="fmManage")
if fm_manage is None:
    sys.exit("Failed to get word list. Please report this problem to the developer.")
total_word_number = int(fm_manage.find(id="tabcontent").find("span").contents[0])
total_page_number = (total_word_number // 30) + 1
cards = []

# 1ページ(30単語)ごとに内容を取得
for i in range(total_page_number):
    page = i + 1
    print(f"Getting cards in page {page}...")
    flashcard_url = f"https://eowp.alc.co.jp/wordbook/ej?page={page}&col=1&sort=1"

    r = s.get(url=flashcard_url)
    soup = BeautifulSoup(r.content, "lxml")
    fm_manage = soup.find(id="fmManage")
    words = fm_manage.find_all(class_="direction_word nopadding toggler")
    words = [str(j.text) for j in words]
    definitions = fm_manage.find_all(class_="desc")
    definitions = [str(k.select("div:nth-of-type(1)")[0]) for k in definitions]

    for word, definition in zip(words, definitions):
        cards.append((word, definition))

    time.sleep(5)

# 重複排除
cards = list(set(cards))
# アルファベット順に並び替え
cards.sort(key=lambda x: x[0])
# Anki用ヘッダの追加
cards.insert(0, ("#html:true", ""))
cards.insert(0, ("#separator:tab", ""))

# CSVに書き出し
with open(output, mode="w", encoding="utf-8", newline="") as f:
    writer = csv.writer(f, delimiter="\t")
    writer.writerows(cards)

print(f"Exported card list to {output}!")

Exeファイル

Python環境を作ってられないという面倒くさがり屋さんのためにWindows用のexeファイルを準備しました。
コマンドプロンプトから実行してください。
https://gist.github.com/maruume/75d0fa1851b2b1a3d36e28500822010e/raw/7331dd4f3535fc2a5ba4e7e1af7ba74c58e817fc/eow_flashcards.exe

CSVのインポート

デスクトップ版のAnkiを起動
Decksタブを選択
右下のImport Fileボタンをクリック
CSVファイルを選択
以下のオプションを選択してImportボタンをクリック

Field separator: Tab
Allow HTML in fields: オン
Notetype: Basic
Deck: インポートしたいデッキを選ぶ
Existing notes: 上書きしたい場合はUpdate
Match scope: Notetype and deck

うまく動かないなどお困りの点があれば教えてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up