4
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Alfred WorkflowとNotionのデータベース機能で最強の英語辞書を作ろう!(DeepL, 英辞郎, ロングマン英和辞典, OCR)

Last updated at Posted at 2022-11-03

TL;DR

① DeepLで翻訳
② ロングマン英和辞典で単語検索
③ 英辞郎から単語検索
を同時に実行し、Notion Databaseに保存して自分だけの辞書を作ることができるAlfred Workflowを作成した。
AlfredというMac専用のランチャーアプリ(Spotlight検索の完全上位互換)の有料版を買うと使うことが出来ます。Alfredの検索窓からd+半角スペースのあとに翻訳したい単語(文章)を打つと、DeepLで翻訳&ロングマン英和辞典と英辞郎から単語検索された結果が表示される。英和も和英も対応(OCRは英和のみ)。NotionのDatabaseに保存できる。
DeepLはDeepL APIから引っ張ってきて、ロングマン英和辞典と英辞郎はスクレイピングしている。

画面収録 2022-12-28 16.31.48 (1)_fps30_width640.gif

機能

  • DeepLで翻訳
  • Longman英和・和英辞典の単語検索
  • 英辞郎英和・和英辞典の単語検索
  • スクショ画像をOCRして英和で翻訳
  • 翻訳結果をNotionのDatabaseに保存する

DeepLで翻訳

DeepLで翻訳したものが以下
image.png
既存のDeepL用のAlfred Workflowは、ピリオド.を打つ必要があったがそれをなくし、ただ翻訳文が返ってきて勉強にならなかった点を改良し、元の文も翻訳文も見られるようにした(Titleに翻訳後の文、Subtitleに翻訳前の文)。さらに、DeepL.comでは、日本語訳にカンマピリオドと句読点が混在されることがあるが、句読点に統一している。

Longman英和・和英辞典の単語検索

Atsu英語氏がLongman英和最強! と言っていたので、ロングマンを使う。
ロングマン英和辞典で辞書検索し、例文も取得できる。

image.png

英辞郎英和・和英辞典の単語検索

ロングマン英和辞典では、"come up with"などの英熟語は載っていない (追記: come up with somethingは載っている)。そこで、英熟語が豊富に載っている英辞郎も使えるようにした。例文も取得できる。

image.png

curlしているので英辞郎Proでしか見れない単語も、例文までは見れないが一応見ることができる。

image.png

image.png

スクショ画像をOCRして英和で翻訳

PDFの仕様などでコピーができない場合や、翻訳したい文章が画像の場合のときのため、スクショをOCRでテキストにしてそれをDeepL APIに入れるものも作った。こちらは、検索窓にddと打つと、スクショが保存されるフォルダの最新の画像についてOCRが実行される。pyocrを使っている。

image.png

image.png

翻訳結果をNotionのDatabaseに保存する・クリップボードにコピーする

⌘+EnterでNotionのDatabaseに追加される。
Enterでコピーされる。DeepLの場合、リストの一番上でEnterすれば全訳がコピーされる。
単語と意味がデータベースに追加され、単語帳のように確認することができる。

image.png

一連の流れ(cockyという単語を検索した場合)

  1. Alfredの検索窓からd cocky
    image.png
  2. cockyをデータベースへ追加のところで⌘+Enterを押す
  3. 通知が来る
    20221103-030708.jpg
  4. Notion Databaseに追加されている
    image.png

ロングマンと英辞郎の好きな方を選んでDatabaseに追加できる。Linkをよく見て、https://www.ldoceonline.com/~ならロングマン、https://eow.alc.co.jp/~なら英辞郎の検索結果がDatabaseに保存される。

20221228-154830.jpg

Motivation

ロングマン英和辞典をスクレイピングするWorkflowを作ったが、

英熟語は弱い。英熟語に強い辞書は英辞郎だが、それでもフレーズや文章は勿論訳せない。DeepLは良いけどまあ辞書には勝てない。ということで、全てを盛り込んだWorkflowを作っちゃおうと。

Procedure

What is Alfred?

Mac専用のランチャーアプリ。Spotlight検索の完全上位互換。Macユーザの作業効率化には必須。無料でもまあまあ使えるけど、Workflowを使うにはライセンスを買わないといけない(買い切り!)。
https://www.alfredapp.com/ から Buy Powerpackでok
image.png

Install this Workflow

インストールはDownload Workflowをダブルクリック、もしくは以下から。

Setting

Workflowの右上
image.png
の[x]のところ、Workflow Environment Variablesに各自のDeepL_AUTH_KEY, NOTION_API_KEY, NOTION_DATABASE_URLSCREENSHOT_PATHを入力する。
IMG_4145.jpg

詳細は以下。

Get DeepLAPI

無料で月50万文字まで利用できる。姓名、メールアドレス、パスワード、住所、クレジットカード番号などの入力が必要(繰り返しになるが、月50万文字まで無料なので請求はされない)。
以下から作成できる ↓↓↓
https://www.deepl.com/pro#developer

20221225-163130.jpg

xxxxxxxxxxxxxxxxxxxxx:fxをコピーし、DeepL_AUTH_KEYに入れる。

参考手順

Get NotionAPI

以下からNotionAPIを取得する。
https://www.notion.so/my-integrations

image.png
secret_xxxxxxxxxxxxxxxxxをコピーして、NOTION_API_KEYに入れる。

参考手順

Get Notion Database URL

Notion Database 参考配布 から複製をクリックし、Databaseを各自のNotionのWorkSpaceに入れる。
Open as pageからDatabaseのリンクを開く。
image.png

そのときのhttps://www.notion.so/XXXXXXXXXXX?v=????????????????? のようなリンクをコピーして、NOTION_DATABASE_URLに入れる。

OCR用のPath設定

各自のスクショが保存されるPathをコピーして、SCREENSHOT_PATHに入れる。(私の場合、/Users/kt/sccapture/*。最後の*を忘れないように注意。)

SCREENSHOT_PATHの他に、tesseractのPathを通すことも必要。pyocrを使うには、tesseractをインストールして、そのpathを通す必要がある。デフォルトでは/usr/local/bin/tesseractにしている。そうでなければ[x]のところに書き直し。必要に応じてPathを通すか、Workflowの中にコマンドをぶち込む。

Distribution

デフォルトはこれ。

image.png

Source

deepl.sh
#!/bin/bash
auth_key=$DEEPL_AUTH_KEY || '';

PATH="$PATH:/usr/local/bin/"
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
PARSER="jq"
if ! type "$PARSER" >/dev/null 2>&1; then
  PARSER="${DIR}/jq-dist"
  xattr -dr com.apple.quarantine "$PARSER"
  chmod +x "$PARSER"
fi

function printJson() {
  echo '{"items": [{"uid": null,"arg": "'"$1"'","valid": "yes","autocomplete": "autocomplete","title": "'"$1"'"}]}'
}

POSITIONAL=()
while [[ $# -gt 0 ]]; do
  key="$1"
  case "$key" in
  -l | --lang)
    LANGUAGE="$2"
    shift 
    shift 
    ;;
  *)
    POSITIONAL+=("$1") 
    shift
    ;;
  esac
done
set -- "${POSITIONAL[@]:-}"


if [ -z "$1" ]; then
  echo "Home made DeepL CLI (${VERSION}; https://github.com/kentoak/deepLAlfred)"
  echo ""
  echo "SYNTAX : $0 [-l language] <query>" >&2
  echo "Example: $0 -l DE \"This is just an example.\""
  echo ""
  exit 1
fi

query="$1"
#query="$(echo "$query" | sed 's/\"/\\\"/g')" 
query="$(echo "$query" | gsed -E 's/\. ([a-z])/\. \U\1/g')"
query="$(echo "$query" | sed "s/\"/'/g")" 
query="$(echo $query | sed -e "s/[\r\n]\+/ /g")" #改行を半角スペースに
#query="$(echo "$query" | sed "s/'/\\\'/g")" 
query="$(echo "$query" | sed "s/& /%26%20/g")" 
query="$(echo "$query" | sed "s/% /%25%20/g")"
query="$(echo "$query" | sed "s/¼/=/g")"
query="$(echo "$query" | sed "s/´/'/g")"
query="$(echo $query | sed -e "s/[\r\n]\+//g")"
query="$(echo "$query" | iconv -f utf-8-mac -t utf-8 | xargs)"           

result=$(curl -H 'Content-Type:application/x-www-form-urlencoded' -POST https://api-free.deepl.com/v2/translate -d "auth_key=${auth_key}" -d "text=${query}" -d "target_lang=${LANGUAGE:-EN}")

if [[ $result == *'"error":{"code":'* ]]; then
  message=$(echo "$result" | "$PARSER" -r '.["error"]|.message')
  printJson "Error: $message"
else
  sts=$(echo "$result" | "$PARSER" -r ".translations[0].text") 
  sts="$(echo "$sts" | sed 's/\"/\\\"/g')" 
  sts="$(echo "$sts" | sed 's/./。/g' | sed 's/,/、/g')" 
  sts1="$sts"
  cnt1="$(echo "$sts" | wc -m | bc)"
  CNT=$cnt1
  myQuery=$query 
  myQuery="$(echo "$myQuery" | sed 's/%26/\&/g')" 
  myQuery="$(echo "$myQuery" | sed 's/%25/\%/g')" 
  myQuery="$(echo "$myQuery" | sed 's/%20/ /g')"
  myQuery="$(echo "$myQuery" | sed 's/\"/\\\"/g')"
  cnt2="$(echo "$myQuery" | wc -m | bc)"
  if [[ ${query:0:20} != ${sts:0:20} ]]; then
    if [[ ${LANGUAGE:-EN} == "JA" ]]; then 
      sts="$(echo "$sts" | sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789/')"
      numForTitle=40
      if [[ $cnt1 -gt $(($numForTitle+1)) ]]; then 
        start=0
        startForSubtitle=0
        MM=()
        numForSubtitle=83
        numForTitle=40
        subtitleFinish=false
        cnt1LastFlag=false
        while [ 1 ]
        do
          cnt1=`expr $cnt1 - $numForTitle`
          if [[ $cnt1 -gt 0 ]]; then
            now=${sts:$((start)):$((numForTitle))} 
            now1=${sts1:$((start)):$((numForTitle))}
          else
            now=${sts:$((start))}
            now1=${sts1:$((start))}
          fi
          cntForEnd="$(echo "${myQuery:$((startForSubtitle))}" | wc -m | bc)"
          if [[ $cntForEnd -gt $numForSubtitle ]]; then
            endend=$numForSubtitle
            for ((i=0; i < $numForSubtitle; i++)); do
              if [[ $i -eq 0 ]]; then
                endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
                if [[ $endbreak == " " ]]; then
                  break
                fi
              else
                endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
                if [[ $endbreak == " " ]]; then
                  break
                fi
                endend=$((endend-1))
              fi
            done
            nowForSubtitle=${myQuery:$((startForSubtitle)):$((endend))}
          fi
          cnt2=`expr $cnt2 - $endend`
          if [[ $start == 0 ]]; then 
            if [[ $cnt2 -gt 0 ]]; then
              a='{"title":"'$now'","arg":"'$sts1'","subtitle":"'$nowForSubtitle'"},'
            else
              a='{"title":"'$now'","arg":"'$sts1'","subtitle":"'${myQuery:$((startForSubtitle))}'"},'
              subtitleFinish=true
            fi
          else 
            if [[ $cnt1 -gt 0 ]]; then
              if [[ $cnt2 -gt 0 ]]; then
                a='{"title":"'$now'","arg":"'$now1'","subtitle":"'$nowForSubtitle'"},'
              else
                a='{"title":"'$now'","arg":"'$now1'","subtitle":"'${myQuery:$((startForSubtitle))}'"},'
              fi
            else
              if [[ $cnt2 -gt 0 ]]; then
                if [[ ${cnt1LastFlag} == false ]]; then
                  a='{"title":"'$now'","arg":"'$now1'","subtitle":"'$nowForSubtitle'"},'
                  cnt1LastFlag=true
                else
                  a='{"title":"","arg":"","subtitle":"'$nowForSubtitle'"},'
                fi
              else
                if [[ $tmpStart == $start ]]; then 
                  a='{"title":"","arg":"","subtitle":"'${myQuery:$((startForSubtitle))}'"}'
                else
                  if "${subtitleFinish}"; then
                    a='{"title":"'$now'","arg":"'$now1'","subtitle":""}'
                  else
                    a='{"title":"'$now'","arg":"'$now1'","subtitle":"'${myQuery:$((startForSubtitle))}'"}'
                  fi
                fi
              fi
            fi
          fi
          startForSubtitle=`expr $startForSubtitle + $endend`
          tmpStart=$start
          if [[ `expr $tmpStart + $numForTitle` -le CNT ]]; then
            start=`expr $tmpStart + $numForTitle`
          else
            start=$tmpStart
          fi
          MM+=($a)
          if [[ $cnt1 -lt 0 ]] && [[ $cnt2 -lt 0 ]]; then
            break
          fi
        done
        mo=${MM[@]}
        echo '{"items":['$mo']}' | "$PARSER" .
      else 
        numForSubtitle=83
        cnt2="$(echo "$myQuery" | wc -m | bc)"
        startForSubtitle=0
        if [[ $cnt2 -gt $numForSubtitle ]]; then 
          while [ 1 ]
          do
            cnt2=`expr $cnt2 - $numForSubtitle`
            if [[ $cnt2 -gt 0 ]]; then
              endend=$numForSubtitle
              for ((i=0; i < $numForSubtitle; i++)); do
                if [[ $i -eq 0 ]]; then
                  endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
                  if [[ $endbreak == " " ]]; then
                    break
                  fi
                else
                  endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
                  if [[ $endbreak == " " ]]; then
                    break
                  fi
                  endend=$((endend-1))
                fi
              done
              nowForSubtitle=${myQuery:$((startForSubtitle)):$((endend))}
            fi
            if [[ $cnt2 -gt 0 ]]; then 
              a='{"title":"'$sts'","arg":"'$sts'","subtitle":"'$nowForSubtitle'"},'
            else
              a='{"title":"","arg":"","subtitle":"'${myQuery:$((startForSubtitle))}'"}'
            fi
            startForSubtitle=`expr $startForSubtitle + $endend`
            MM+=($a)
            if [[ $cnt2 -lt 0 ]]; then
              break
            fi
          done
          mo=${MM[@]}
          echo '{"items":['$mo']}' | "$PARSER" .
        else
          a='{"title":"'$sts'","arg":"'$sts1'","subtitle":"'$myQuery'"}'
          echo '{"items":['$a']}' | "$PARSER" .
        fi
      fi
    else
      numForTitle=83
      numForSubtitle=39
      if [[ $cnt1 -gt $(($numForTitle+1)) ]]; then
        start=0
        MM=()
        u=-1
        while [ 1 ]
        do
          u=$((u+1))
          cnt1=`expr $cnt1 - $numForTitle`
          if [[ $cnt1 -gt 0 ]]; then
            endend=$numForTitle
            for ((i=0; i < $numForTitle; i++)); do
              if [[ $i -eq 0 ]]; then
                endbreak=${sts:$((start+endend-1)):$((1))}
                if [[ $endbreak == " " ]]; then
                  break
                fi
              else
                endbreak=${sts:$((start+endend-1)):$((1))}
                if [[ $endbreak == " " ]]; then
                  break
                fi
                endend=$((endend-1))
              fi
            done
            numForTitle=$endend
            now=${sts:$((start)):$((endend))}
          else
            now=${sts:$((start))}
          fi
          if [[ $start == 0 ]]; then
            a='{"title":"'$now'","arg":"'$sts'","subtitle":"'${myQuery:$numForSubtitle*u:$numForSubtitle}'"},'
          else
            if [[ $cnt1 -gt 0 ]]; then
              a='{"title":"'$now'","arg":"'$now'","subtitle":"'${myQuery:$numForSubtitle*u:$numForSubtitle}'"},'
            else
              a='{"title":"'$now'","arg":"'$now'","subtitle":"'${myQuery:$numForSubtitle*u:$numForSubtitle}'"}'
            fi
          fi
          start=`expr $start + $endend`
          MM+=($a)
          if [[ $cnt1 -lt 0 ]]; then
            break
          fi
        done
        mo=${MM[@]}
        echo '{"items":['$mo']}' | "$PARSER" .
      else
        a='{"title":"'$sts'","arg":"'$sts1'","subtitle":"'$myQuery'"}'
        echo '{"items":['$a']}' | "$PARSER" .
      fi
    fi
  fi
fi
post_notion.py
import requests
from pprint import pprint
import json
from bs4 import BeautifulSoup
from urllib.parse import urlencode
from urllib.request import urlopen, Request
import re
import sys
import os
import os.path
import datetime

notion_api_key = os.environ["NOTION_API_KEY"] or ""
notion_database_url = os.environ["NOTION_DATABASE_URL"] or ""
deepl_auth_key = os.environ["DEEPL_AUTH_KEY"] or ""

def longman(spell):
    spell=word.replace(" ","%20")
    if onlyAlphabet(spell):
        spell = spell.lower()
        url = "https://www.ldoceonline.com/jp/dictionary/english-japanese/" + spell
    else:
        url = "https://www.ldoceonline.com/jp/dictionary/japanese-english/" + spell
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
    }
    source = requests.get(url, headers=headers)
    data = BeautifulSoup(source.content, "html.parser")
    explanation_list = []
    if data.select(".lejEntry"):
        if data.select(".Translation"):
            for i in range(len(data.select(".Translation"))):
                if data.select(".Translation")[i].select(".BOXTRAN.TRAN"):
                    continue
                if data.select(".Translation")[i].select(".PRETRANCOM") or data.select(".Translation")[i].select(".COLL"):
                    explanation_list.append(
                        data.select(".Translation")[i].get_text())
                    continue
                tmp = ""
                if data.select(".Translation")[i].select(".TRAN"):
                    for j in range(len(data.select(".Translation")[i].select(".TRAN"))):
                        tmp += data.select(".Translation")[
                            i].select(".TRAN")[j].get_text()
                if tmp:
                    explanation_list.append(tmp)
    if data.select(".ljeEntry"):
        if data.select(".Subentry"):
            for i in range(len(data.select(".Subentry"))):
                t = ""
                if i == 0:
                    t += data.select(".HWD")[0].get_text()
                t += data.select(".Subentry")[i].get_text()
                explanation_list.append(t)
    tao = explanation_list
    result = ""
    for idx, txt in enumerate(tao):
        if idx < len(tao)-1:
            tmp = tao[idx]
            result += tmp.strip()+", "
        else:
            tmp = tao[idx]
            result += tmp.strip()
    result1 = []
    for txt in tao:
        result1.append(txt.strip())
    if len(explanation_list) == 0:
        return "(error) this word is not found"
    return result1

def eijiro(word):
    spell=word.replace(" ","%20")
    if onlyAlphabet(spell[0]) or onlyAlphabet(spell[-1]):
        spell = spell.lower()
        url = "https://eow.alc.co.jp/search?q=" + spell
    else:
        url = "https://eow.alc.co.jp/search?q=" + spell
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
    }
    source = requests.get(url, headers=headers)
    data = BeautifulSoup(source.content, "html.parser")
    explanation_list = []

    if data.select("#resultsList"):
        if data.select("#resultsList")[0].find("ul"):
            if data.select("#resultsList")[0].find("ul").find("li").find_all("div"):
                for i in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div"))):
                    if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"):
                        for j in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"))):
                            if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"):
                                for k in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"))):
                                    explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li")[k].get_text())
                    else:
                        explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].get_text())
    tao = explanation_list
    result1 = []
    for txt in tao:
        result1.append(txt.strip())
    if len(explanation_list) == 0:
        return "(error) this word is not found"
    return result1

def deepLMeaning(word, onlyAlphabetFlag):
    headers = {
        'Content-Type': 'application/x-www-form-urlencoded'
    }
    if onlyAlphabetFlag:
        targetLang = "JA"
    else:
        targetLang = "EN"

    data = {
        "auth_key": deepl_auth_key,
        "text": word,
        "target_lang": targetLang
    }

    result = requests.post(
        'https://api-free.deepl.com/v2/translate', headers=headers, data=data)
    result = result.json()
    if result["translations"]:
        return result["translations"][0]["text"]

meaningList = []
links = []
useDeepL=[]
def main(word, onlyAlphabetFlag, link):
    if "https://www.ldoceonline.com/jp/dictionary" in link:
        useLongman = longman(word)
        for idx, i in enumerate(useLongman):
            if onlyAlphabetFlag:
                ken = i.split("• ")
            else:
                ken = i.split("‣")
            if len(ken) > 1:
                for idx, k in enumerate(ken):
                    if onlyAlphabetFlag:
                        if idx == 0:
                            meaningList.append(ken[0])
                    else:
                        if " → See English-Japanese Dictionary" in k:
                            k = k.split(
                                " → See English-Japanese Dictionary")[0]
                        if idx == 0:
                            meaningList.append(ken[0])
            else:
                meaningList.append(ken[0])
    elif "https://eow.alc.co.jp/search" in link:
        useEijiro = eijiro(word)
        if useEijiro != "(error) this word is not found":
            spell=word.replace(" ","%20")
            if onlyAlphabet(spell):
                spell = spell.lower()
                url = "https://eow.alc.co.jp/search?q=" + spell
            else:
                url = "https://eow.alc.co.jp/search?q=" + spell
            links.append(url)
            for idx, i in enumerate(useEijiro):
                if ":" in i:
                    ken = i.split("・")
                else:
                    ken = [i]
                meaningList.append(ken[0])
    else:
        useDeepL.append(1)
        meaningList.append(deepLMeaning(word, onlyAlphabetFlag))


def onlyAlphabet(text):
    re_roman = re.compile(r'^[a-zA-Z\.]+$')
    return re_roman.fullmatch(text[0])


def get_request_url(end_point):
    return f'https://api.notion.com/v1/{end_point}'


if "https://www.ldoceonline.com/jp/dictionary" in sys.argv[1:][0]:
    link = sys.argv[1:][0]
    word = link[59:].replace("%20"," ")
elif "https://eow.alc.co.jp/search" in sys.argv[1:][0]:
    link = sys.argv[1:][0]
    word = link[31:].replace("%20"," ")
else:
    link = "なし"
    word = " ".join(sys.argv[1:])
onlyAlphabet_ = onlyAlphabet(sys.argv[1:][0][0]) or onlyAlphabet(sys.argv[1:][0][-1])
main(word, onlyAlphabet_, link)
if len(useDeepL)==0:
    if onlyAlphabet(word[0]) or onlyAlphabet(word[-1]):
        word = word.lower()
if links:
    link=links[0]
headers_for_notion = {"Authorization": f"Bearer {notion_api_key}",
                      "Content-Type": "application/json",
                      "Notion-Version": "2021-05-13"}
databases_ids = [notion_database_url]
databases_id = databases_ids[0][22:][:databases_ids[0][22:].find('?')]
response = requests.request('GET', url=get_request_url(
    f'databases/{databases_id}'), headers=headers_for_notion)
headers = {
    "accept": "application/json",
    "Content-Type": "application/x-www-form-urlencoded"
}

now = datetime.datetime.now()

if len(meaningList) >= 7:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
            "Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
            "Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
            "Meaning5": {"rich_text": [{"text": {"content": meaningList[4]}}]},
            "Meaning6": {"rich_text": [{"text": {"content": meaningList[5]}}]},
            "Meaning7": {"rich_text": [{"text": {"content": ",".join(meaningList[6:])}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
elif len(meaningList) == 6:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
            "Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
            "Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
            "Meaning5": {"rich_text": [{"text": {"content": meaningList[4]}}]},
            "Meaning6": {"rich_text": [{"text": {"content": meaningList[5]}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
elif len(meaningList) == 5:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
            "Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
            "Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
            "Meaning5": {"rich_text": [{"text": {"content": meaningList[4]}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
elif len(meaningList) == 4:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
            "Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
            "Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
elif len(meaningList) == 3:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
            "Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
elif len(meaningList) == 2:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
elif len(meaningList) == 1:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }
else:
    body_for_notion = {
        "parent": {
            "database_id": databases_id
        },
        "properties": {
            "Word": {"title": [{"text": {"content": word}}]},
            "Link": {"url": link},
            "Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
        }
    }

res = requests.request('POST', url=get_request_url(
    'pages'), headers=headers_for_notion, data=json.dumps(body_for_notion))

if res.status_code == 200:
    print("データベースに", word, "を追加しました!")
else:
    print("失敗しました...")


scrayping_eijiro.py
# coding: utf-8
import requests
from bs4 import BeautifulSoup
import sys
import json
import re


def main(spell):
    spell=spell.replace(" ","%20")
    if onlyAlphabet(spell[0]) or onlyAlphabet(spell[-1]):
        spell = spell.lower()
        url = "https://eow.alc.co.jp/search?q=" + spell
    else:
        url = "https://eow.alc.co.jp/search?q=" + spell
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
    }
    source = requests.get(url, headers=headers)
    data = BeautifulSoup(source.content, "html.parser")
    explanation_list = []

    if data.select("#resultsList"):
        if data.select("#resultsList")[0].find("ul"):
            if data.select("#resultsList")[0].find("ul").find("li").find_all("div"):
                for i in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div"))):
                    if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"):
                        for j in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"))):
                            if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"):
                                for k in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"))):
                                    explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li")[k].get_text())
                            else:
                                explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].get_text())
                    else:
                        explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].get_text())
                    
    tao = explanation_list
    result1 = []
    for txt in tao:
        result1.append(txt.strip())
    if len(explanation_list) == 0:
        return "(error) this word is not found"
    return result1


def onlyAlphabet(text):
    re_roman = re.compile(r'^[a-zA-Z\.]+$') 
    return re_roman.fullmatch(text)


def onlyJa(text):
    re_ja = re.compile(
        r'^[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}$')
    return re_ja.fullmatch(text)


if __name__ == '__main__':
    spell = " ".join(sys.argv[1:]).strip()
    out = main(spell)
    obj = []
    if out == "(error) this word is not found":
        tao = {
            'title': "error",
            'subtitle': "(error) This word is not found",
            'arg': "error"
        }
        obj.append(tao)
    else:
        for idx, i in enumerate(out):
            if "◆" in i:
                i=i[:i.find("◆")]
            if ":" in i:
                if i.find('・')>0:
                    tmp = i.split("・")
                    ken = []
                    tao = ""
                    for k in range(len(tmp)):
                        if onlyAlphabet(tmp[k][0]):
                            ken.append(tao[:-1])
                            tao=tmp[k]
                            tao+="・"
                        else:
                            tao+=tmp[k]
                            tao+="・"
                    if tao != "":
                        ken.append(tao)
                else: 
                    ken = [i]
            else:
                ken = [i]
            if len(ken) > 1:
                for idx, k in enumerate(ken):
                    if onlyAlphabet(spell.split()[0]):
                        if idx == 0:
                            tao = {
                                'title': ken[0],
                                'arg': k
                            }
                        else:
                            u = k.split(":")
                            reibun_j = ""
                            reibun_e = ""
                            if u[0]:
                                for now in u:
                                    if now:
                                        if onlyAlphabet(now[0]) or onlyAlphabet(now[-1]):
                                            reibun_e += now
                                            reibun_e += " "
                                        else:
                                            reibun_j += now
                                
                                tao = {
                                    'title': "  ‣ "+reibun_e,
                                    'subtitle': "    "+reibun_j,
                                    'arg': k
                                }
                            else:
                                continue
                        obj.append(tao)
                    else:
                        print("アルファベットじゃない")
            else:
                tao = {
                    'title': ken[0],
                    'arg': ken[0]
                }
                obj.append(tao)
    jso = {'items': obj}
    sys.stdout.write(json.dumps(jso, ensure_ascii=False))
scrapying_longman.py
# coding: utf-8
import requests
from bs4 import BeautifulSoup
import sys
import json
import re


def main(spell):
         spell=spell.replace(" ","-")
    if onlyAlphabet(spell[0]) or onlyAlphabet(spell[-1]):
        spell = spell.lower()
        url = "https://www.ldoceonline.com/jp/dictionary/english-japanese/" + spell
    else:
        url = "https://www.ldoceonline.com/jp/dictionary/japanese-english/" + spell
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
    }
    source = requests.get(url, headers=headers)
    data = BeautifulSoup(source.content, "html.parser")
    explanation_list = []
    if data.select(".lejEntry"):
        if data.select(".Translation"):
            for i in range(len(data.select(".Translation"))):
                if data.select(".Translation")[i].select(".BOXTRAN.TRAN"):
                    continue
                if data.select(".Translation")[i].select(".PRETRANCOM") or data.select(".Translation")[i].select(".COLL"):
                    explanation_list.append(
                        data.select(".Translation")[i].get_text())
                    continue
                tmp = ""
                if data.select(".Translation")[i].select(".TRAN"):
                    for j in range(len(data.select(".Translation")[i].select(".TRAN"))):
                        tmp += data.select(".Translation")[
                            i].select(".TRAN")[j].get_text()
                if tmp:
                    explanation_list.append(tmp)
    if data.select(".ljeEntry"): 
        if data.select(".Subentry"):
            for i in range(len(data.select(".Subentry"))):
                t = ""
                if i == 0:
                    t += data.select(".HWD")[0].get_text()
                t += data.select(".Subentry")[i].get_text()
                explanation_list.append(t)
    tao = explanation_list
    result = ""
    for idx, txt in enumerate(tao):
        if idx < len(tao)-1:
            tmp = tao[idx]
            result += tmp.strip()+", "
        else:
            tmp = tao[idx]
            result += tmp.strip()
    result1 = []
    for txt in tao:
        result1.append(txt.strip())
    if len(explanation_list) == 0:
        return "(error) this word is not found"
    return result1


def onlyAlphabet(text):
    re_roman = re.compile(r'^[a-zA-Z\.]+$')
    return re_roman.fullmatch(text)


def onlyJa(text):
    re_ja = re.compile(
        r'^[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}$')
    return re_ja.fullmatch(text)


if __name__ == '__main__':
    spell = " ".join(sys.argv[1:]).strip()
    out = main(spell)
    obj = []
    if out == "(error) this word is not found":
        tao = {
            'title': "error",
            'subtitle': "(error) This word is not found",
            'arg': "error"
        }
        obj.append(tao)
    else:
        for idx, i in enumerate(out):
            if onlyAlphabet(spell):
                ken = i.split("• ")
            else:
                ken = i.split("‣")
            if len(ken) > 1:
                for idx, k in enumerate(ken):
                    if onlyAlphabet(spell):
                        if idx == 0:
                            tao = {
                                'title': ken[0],
                                'arg': k
                            }
                        else:
                            u = k.split(" ")
                            reibun_j = ""
                            reibun_e = ""
                            for i in u:
                                if onlyAlphabet(i):
                                    reibun_e += i
                                    reibun_e += " "
                                else:
                                    reibun_j += i
                            tao = {
                                'title': "  ‣ "+reibun_e,
                                'subtitle': "    "+reibun_j,
                                'arg': k
                            }
                        obj.append(tao)
                    else:
                        if " → See English-Japanese Dictionary" in k:
                            k = k.split(
                                " → See English-Japanese Dictionary")[0]
                        if idx == 0:
                            tao = {
                                'title': ken[0],
                            }
                        else:
                            tao = {
                                'title': "‣ "+k[2:],
                                'subtitle': k[2:].split(" 〘")[0],
                                'arg': k[2:].split(" 〘")[0]
                            }
                        obj.append(tao)
            else:
                tao = {
                    'title': ken[0],
                    'arg': ken[0]
                }
                obj.append(tao)
    jso = {'items': obj}
    sys.stdout.write(json.dumps(jso, ensure_ascii=False))
ocr.py
import os
import sys
sys.path.append('/usr/local/lib/python3.9/site-packages')
import glob
import json
from PIL import Image
import pyocr
import requests
import random

deepl_auth_key = os.environ["DEEPL_AUTH_KEY"] or ""
screenshot_path = os.environ["SCREENSHOT_PATH"] or ""

pyocr.tesseract.TESSERACT_CMD = os.environ["OCR_PATH"] or r'/usr/local/bin/tesseract'
tools = pyocr.get_available_tools()
tool = tools[0]

list_of_files = glob.glob(screenshot_path) 
latest_file = max(list_of_files, key=os.path.getctime)
img = Image.open(latest_file)

builder = pyocr.builders.TextBuilder(tesseract_layout=6)
text = tool.image_to_string(img, lang="eng", builder=builder)
text = text.replace('\"','\\\"')
text = text.replace('\'',"\\'")
text = text.replace('&','%26')
text = text.replace('\n',' ')


deepl_token=deepl_auth_key

source_lang = 'EN'  
target_lang = 'JA'  
param = {
    'auth_key' : deepl_token,
    'text' : text,
    'source_lang' : source_lang,
    "target_lang": target_lang
}

request = requests.post("https://api-free.deepl.com/v2/translate", data=param)
result = request.json()
resultText = result['translations'][0]['text']
resultText.replace('\\"','\"')
resultText.replace(" ","")
resultText.replace('.','。').replace(',','、')

sts=resultText
cnt = len(sts)
CNT=cnt
start=0
tao=[]
subtex=text
subtex=subtex.replace("\\'","\'")
cnt2=len(subtex)
numForTitle=40
if cnt > numForTitle+1:
    start=0
    tmpStart=start
    startForSubtitle=0
    MM=[]
    numForSubtitle=83
    endend=0
    subtitleFinish=False
    while True:
        numForTitle=40
        cnt-=numForTitle
        cnt2-=numForSubtitle
        if cnt > 0:
            now = sts[start:start+numForTitle]
        else:
            now = sts[start:]
        if cnt2>0:
            endend=numForSubtitle
            for i in range(numForSubtitle):
                if i==0:
                    endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
                    if endbreak == " ":
                        break
                else:
                    endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
                    if endbreak == " ":
                        break
                    endend-=1
            nowForSubtitle=subtex[startForSubtitle:startForSubtitle+endend]
        if start == 0:
            if cnt2>0:
                a={"title":now,"arg":sts,"subtitle":nowForSubtitle}
            else:
                a={"title":now,"arg":sts,"subtitle":subtex[startForSubtitle:]}
                subtitleFinish=True
        else:
            if cnt>0:
                if cnt2>0:
                    a={"title":now,"arg":now,"subtitle":nowForSubtitle}
                else:
                    a={"title":now,"arg":now,"subtitle":subtex[startForSubtitle:]}
            else:
                if cnt2>0:
                    a={"title":now,"arg":now,"subtitle":nowForSubtitle}
                else:
                    if tmpStart == start:
                        a={"title":"","arg":"","subtitle":subtex[startForSubtitle:]}
                    else:
                        if subtitleFinish:
                            a={"title":now,"arg":now,"subtitle":""}
                        else:
                            a={"title":now,"arg":now,"subtitle":subtex[startForSubtitle:]}
        startForSubtitle+=endend
        tmpStart=start
        if tmpStart+numForTitle<CNT:
            start=tmpStart+numForTitle
        else:
            start=tmpStart
        tao.append(a)
        if cnt < 0 and cnt2 < 0:
            break
else:
    numForSubtitle=83
    startForSubtitle=0
    if cnt2>numForSubtitle:
        while True:
            cnt2 -= numForSubtitle
            if cnt2>0:
                endend=numForSubtitle
                for i in range(numForSubtitle):
                    if i==0:
                        endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
                        if endbreak == " ":
                            break
                    else:
                        endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
                        if endbreak==" ":
                            break
                        endend-=1
                nowForSubtitle=subtex[startForSubtitle:startForSubtitle+endend]
            if cnt2>0:
                a={"title":sts,"arg":sts,"subtitle":nowForSubtitle}
            else:
                a={"title":"","arg":"","subtitle":subtex[startForSubtitle:]}
            startForSubtitle+=endend
            tao.append(a)
            if cnt2 < 0:
                break
    else:
        a={"title":sts,"arg":sts,"subtitle":subtex}
        tao.append(a)

sys.stdout.write(json.dumps({'items': tao}, ensure_ascii=False))

#os.remove(latest_file) #完全削除
#shutil.move(latest_file,'/Users/kt/.Trash/') #ごみ箱へ移動する場合

最後に

GitHub

Packal

辞書(英辞郎とロングマン英和辞典)と翻訳を一つのコマンドで全部一気に見れるのがよい。
Alfred最高定期。
Notionはもう少し見やすくなってくれ。。。

4
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?