TL;DR
① DeepLで翻訳
② ロングマン英和辞典で単語検索
③ 英辞郎から単語検索
を同時に実行し、Notion Databaseに保存して自分だけの辞書を作ることができるAlfred Workflowを作成した。
AlfredというMac専用のランチャーアプリ(Spotlight検索の完全上位互換)の有料版を買うと使うことが出来ます。Alfredの検索窓からd
+半角スペースのあとに翻訳したい単語(文章)を打つと、DeepLで翻訳&ロングマン英和辞典と英辞郎から単語検索された結果が表示される。英和も和英も対応(OCRは英和のみ)。NotionのDatabaseに保存できる。
DeepLはDeepL APIから引っ張ってきて、ロングマン英和辞典と英辞郎はスクレイピングしている。
機能
- DeepLで翻訳
- Longman英和・和英辞典の単語検索
- 英辞郎英和・和英辞典の単語検索
- スクショ画像をOCRして英和で翻訳
- 翻訳結果をNotionのDatabaseに保存する
DeepLで翻訳
DeepLで翻訳したものが以下
既存のDeepL用のAlfred Workflowは、ピリオド.
を打つ必要があったがそれをなくし、ただ翻訳文が返ってきて勉強にならなかった点を改良し、元の文も翻訳文も見られるようにした(Titleに翻訳後の文、Subtitleに翻訳前の文)。さらに、DeepL.comでは、日本語訳にカンマピリオドと句読点が混在されることがあるが、句読点に統一している。
Longman英和・和英辞典の単語検索
Atsu英語氏がLongman英和最強! と言っていたので、ロングマンを使う。
ロングマン英和辞典で辞書検索し、例文も取得できる。
英辞郎英和・和英辞典の単語検索
ロングマン英和辞典では、"come up with"などの英熟語は載っていない (追記: come up with somethingは載っている)。そこで、英熟語が豊富に載っている英辞郎も使えるようにした。例文も取得できる。
スクショ画像をOCRして英和で翻訳
PDFの仕様などでコピーができない場合や、翻訳したい文章が画像の場合のときのため、スクショをOCRでテキストにしてそれをDeepL APIに入れるものも作った。こちらは、検索窓にdd
と打つと、スクショが保存されるフォルダの最新の画像についてOCRが実行される。pyocrを使っている。
翻訳結果をNotionのDatabaseに保存する・クリップボードにコピーする
⌘+Enter
でNotionのDatabaseに追加される。
Enter
でコピーされる。DeepLの場合、リストの一番上でEnter
すれば全訳がコピーされる。
単語と意味がデータベースに追加され、単語帳のように確認することができる。
一連の流れ(cockyという単語を検索した場合)
ロングマンと英辞郎の好きな方を選んでDatabaseに追加できる。Linkをよく見て、https://www.ldoceonline.com/
~ならロングマン、https://eow.alc.co.jp/~
なら英辞郎の検索結果がDatabaseに保存される。
Motivation
ロングマン英和辞典をスクレイピングするWorkflowを作ったが、
英熟語は弱い。英熟語に強い辞書は英辞郎だが、それでもフレーズや文章は勿論訳せない。DeepLは良いけどまあ辞書には勝てない。ということで、全てを盛り込んだWorkflowを作っちゃおうと。
Procedure
What is Alfred?
Mac専用のランチャーアプリ。Spotlight検索の完全上位互換。Macユーザの作業効率化には必須。無料でもまあまあ使えるけど、Workflowを使うにはライセンスを買わないといけない(買い切り!)。
https://www.alfredapp.com/ から Buy Powerpackでok
Install this Workflow
インストールはDownload Workflowをダブルクリック、もしくは以下から。
Setting
Workflowの右上
の[x]のところ、Workflow Environment Variables
に各自のDeepL_AUTH_KEY
, NOTION_API_KEY
, NOTION_DATABASE_URL
、SCREENSHOT_PATH
を入力する。
詳細は以下。
Get DeepLAPI
無料で月50万文字まで利用できる。姓名、メールアドレス、パスワード、住所、クレジットカード番号などの入力が必要(繰り返しになるが、月50万文字まで無料なので請求はされない)。
以下から作成できる ↓↓↓
https://www.deepl.com/pro#developer
xxxxxxxxxxxxxxxxxxxxx:fx
をコピーし、DeepL_AUTH_KEY
に入れる。
Get NotionAPI
以下からNotionAPIを取得する。
https://www.notion.so/my-integrations
secret_xxxxxxxxxxxxxxxxx
をコピーして、NOTION_API_KEY
に入れる。
Get Notion Database URL
Notion Database 参考配布 から複製をクリックし、Databaseを各自のNotionのWorkSpaceに入れる。
Open as pageからDatabaseのリンクを開く。
そのときのhttps://www.notion.so/XXXXXXXXXXX?v=?????????????????
のようなリンクをコピーして、NOTION_DATABASE_URL
に入れる。
OCR用のPath設定
各自のスクショが保存されるPathをコピーして、SCREENSHOT_PATH
に入れる。(私の場合、/Users/kt/sccapture/*
。最後の*を忘れないように注意。)
SCREENSHOT_PATH
の他に、tesseractのPathを通すことも必要。pyocrを使うには、tesseractをインストールして、そのpathを通す必要がある。デフォルトでは/usr/local/bin/tesseract
にしている。そうでなければ[x]のところに書き直し。必要に応じてPathを通すか、Workflowの中にコマンドをぶち込む。
Distribution
-
Notion Database 参考配布
Word
,Meaning1~7
,Date
,Link
のPropertyがある。使うときはSortでDateの降順にすると最新のものが上に来る。
Source
deepl.sh
#!/bin/bash
auth_key=$DEEPL_AUTH_KEY || '';
PATH="$PATH:/usr/local/bin/"
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
PARSER="jq"
if ! type "$PARSER" >/dev/null 2>&1; then
PARSER="${DIR}/jq-dist"
xattr -dr com.apple.quarantine "$PARSER"
chmod +x "$PARSER"
fi
function printJson() {
echo '{"items": [{"uid": null,"arg": "'"$1"'","valid": "yes","autocomplete": "autocomplete","title": "'"$1"'"}]}'
}
POSITIONAL=()
while [[ $# -gt 0 ]]; do
key="$1"
case "$key" in
-l | --lang)
LANGUAGE="$2"
shift
shift
;;
*)
POSITIONAL+=("$1")
shift
;;
esac
done
set -- "${POSITIONAL[@]:-}"
if [ -z "$1" ]; then
echo "Home made DeepL CLI (${VERSION}; https://github.com/kentoak/deepLAlfred)"
echo ""
echo "SYNTAX : $0 [-l language] <query>" >&2
echo "Example: $0 -l DE \"This is just an example.\""
echo ""
exit 1
fi
query="$1"
#query="$(echo "$query" | sed 's/\"/\\\"/g')"
query="$(echo "$query" | gsed -E 's/\. ([a-z])/\. \U\1/g')"
query="$(echo "$query" | sed "s/\"/'/g")"
query="$(echo $query | sed -e "s/[\r\n]\+/ /g")" #改行を半角スペースに
#query="$(echo "$query" | sed "s/'/\\\'/g")"
query="$(echo "$query" | sed "s/& /%26%20/g")"
query="$(echo "$query" | sed "s/% /%25%20/g")"
query="$(echo "$query" | sed "s/¼/=/g")"
query="$(echo "$query" | sed "s/´/'/g")"
query="$(echo $query | sed -e "s/[\r\n]\+//g")"
query="$(echo "$query" | iconv -f utf-8-mac -t utf-8 | xargs)"
result=$(curl -H 'Content-Type:application/x-www-form-urlencoded' -POST https://api-free.deepl.com/v2/translate -d "auth_key=${auth_key}" -d "text=${query}" -d "target_lang=${LANGUAGE:-EN}")
if [[ $result == *'"error":{"code":'* ]]; then
message=$(echo "$result" | "$PARSER" -r '.["error"]|.message')
printJson "Error: $message"
else
sts=$(echo "$result" | "$PARSER" -r ".translations[0].text")
sts="$(echo "$sts" | sed 's/\"/\\\"/g')"
sts="$(echo "$sts" | sed 's/./。/g' | sed 's/,/、/g')"
sts1="$sts"
cnt1="$(echo "$sts" | wc -m | bc)"
CNT=$cnt1
myQuery=$query
myQuery="$(echo "$myQuery" | sed 's/%26/\&/g')"
myQuery="$(echo "$myQuery" | sed 's/%25/\%/g')"
myQuery="$(echo "$myQuery" | sed 's/%20/ /g')"
myQuery="$(echo "$myQuery" | sed 's/\"/\\\"/g')"
cnt2="$(echo "$myQuery" | wc -m | bc)"
if [[ ${query:0:20} != ${sts:0:20} ]]; then
if [[ ${LANGUAGE:-EN} == "JA" ]]; then
sts="$(echo "$sts" | sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789/')"
numForTitle=40
if [[ $cnt1 -gt $(($numForTitle+1)) ]]; then
start=0
startForSubtitle=0
MM=()
numForSubtitle=83
numForTitle=40
subtitleFinish=false
cnt1LastFlag=false
while [ 1 ]
do
cnt1=`expr $cnt1 - $numForTitle`
if [[ $cnt1 -gt 0 ]]; then
now=${sts:$((start)):$((numForTitle))}
now1=${sts1:$((start)):$((numForTitle))}
else
now=${sts:$((start))}
now1=${sts1:$((start))}
fi
cntForEnd="$(echo "${myQuery:$((startForSubtitle))}" | wc -m | bc)"
if [[ $cntForEnd -gt $numForSubtitle ]]; then
endend=$numForSubtitle
for ((i=0; i < $numForSubtitle; i++)); do
if [[ $i -eq 0 ]]; then
endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
if [[ $endbreak == " " ]]; then
break
fi
else
endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
if [[ $endbreak == " " ]]; then
break
fi
endend=$((endend-1))
fi
done
nowForSubtitle=${myQuery:$((startForSubtitle)):$((endend))}
fi
cnt2=`expr $cnt2 - $endend`
if [[ $start == 0 ]]; then
if [[ $cnt2 -gt 0 ]]; then
a='{"title":"'$now'","arg":"'$sts1'","subtitle":"'$nowForSubtitle'"},'
else
a='{"title":"'$now'","arg":"'$sts1'","subtitle":"'${myQuery:$((startForSubtitle))}'"},'
subtitleFinish=true
fi
else
if [[ $cnt1 -gt 0 ]]; then
if [[ $cnt2 -gt 0 ]]; then
a='{"title":"'$now'","arg":"'$now1'","subtitle":"'$nowForSubtitle'"},'
else
a='{"title":"'$now'","arg":"'$now1'","subtitle":"'${myQuery:$((startForSubtitle))}'"},'
fi
else
if [[ $cnt2 -gt 0 ]]; then
if [[ ${cnt1LastFlag} == false ]]; then
a='{"title":"'$now'","arg":"'$now1'","subtitle":"'$nowForSubtitle'"},'
cnt1LastFlag=true
else
a='{"title":"","arg":"","subtitle":"'$nowForSubtitle'"},'
fi
else
if [[ $tmpStart == $start ]]; then
a='{"title":"","arg":"","subtitle":"'${myQuery:$((startForSubtitle))}'"}'
else
if "${subtitleFinish}"; then
a='{"title":"'$now'","arg":"'$now1'","subtitle":""}'
else
a='{"title":"'$now'","arg":"'$now1'","subtitle":"'${myQuery:$((startForSubtitle))}'"}'
fi
fi
fi
fi
fi
startForSubtitle=`expr $startForSubtitle + $endend`
tmpStart=$start
if [[ `expr $tmpStart + $numForTitle` -le CNT ]]; then
start=`expr $tmpStart + $numForTitle`
else
start=$tmpStart
fi
MM+=($a)
if [[ $cnt1 -lt 0 ]] && [[ $cnt2 -lt 0 ]]; then
break
fi
done
mo=${MM[@]}
echo '{"items":['$mo']}' | "$PARSER" .
else
numForSubtitle=83
cnt2="$(echo "$myQuery" | wc -m | bc)"
startForSubtitle=0
if [[ $cnt2 -gt $numForSubtitle ]]; then
while [ 1 ]
do
cnt2=`expr $cnt2 - $numForSubtitle`
if [[ $cnt2 -gt 0 ]]; then
endend=$numForSubtitle
for ((i=0; i < $numForSubtitle; i++)); do
if [[ $i -eq 0 ]]; then
endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
if [[ $endbreak == " " ]]; then
break
fi
else
endbreak=${myQuery:$((startForSubtitle+endend-1)):1}
if [[ $endbreak == " " ]]; then
break
fi
endend=$((endend-1))
fi
done
nowForSubtitle=${myQuery:$((startForSubtitle)):$((endend))}
fi
if [[ $cnt2 -gt 0 ]]; then
a='{"title":"'$sts'","arg":"'$sts'","subtitle":"'$nowForSubtitle'"},'
else
a='{"title":"","arg":"","subtitle":"'${myQuery:$((startForSubtitle))}'"}'
fi
startForSubtitle=`expr $startForSubtitle + $endend`
MM+=($a)
if [[ $cnt2 -lt 0 ]]; then
break
fi
done
mo=${MM[@]}
echo '{"items":['$mo']}' | "$PARSER" .
else
a='{"title":"'$sts'","arg":"'$sts1'","subtitle":"'$myQuery'"}'
echo '{"items":['$a']}' | "$PARSER" .
fi
fi
else
numForTitle=83
numForSubtitle=39
if [[ $cnt1 -gt $(($numForTitle+1)) ]]; then
start=0
MM=()
u=-1
while [ 1 ]
do
u=$((u+1))
cnt1=`expr $cnt1 - $numForTitle`
if [[ $cnt1 -gt 0 ]]; then
endend=$numForTitle
for ((i=0; i < $numForTitle; i++)); do
if [[ $i -eq 0 ]]; then
endbreak=${sts:$((start+endend-1)):$((1))}
if [[ $endbreak == " " ]]; then
break
fi
else
endbreak=${sts:$((start+endend-1)):$((1))}
if [[ $endbreak == " " ]]; then
break
fi
endend=$((endend-1))
fi
done
numForTitle=$endend
now=${sts:$((start)):$((endend))}
else
now=${sts:$((start))}
fi
if [[ $start == 0 ]]; then
a='{"title":"'$now'","arg":"'$sts'","subtitle":"'${myQuery:$numForSubtitle*u:$numForSubtitle}'"},'
else
if [[ $cnt1 -gt 0 ]]; then
a='{"title":"'$now'","arg":"'$now'","subtitle":"'${myQuery:$numForSubtitle*u:$numForSubtitle}'"},'
else
a='{"title":"'$now'","arg":"'$now'","subtitle":"'${myQuery:$numForSubtitle*u:$numForSubtitle}'"}'
fi
fi
start=`expr $start + $endend`
MM+=($a)
if [[ $cnt1 -lt 0 ]]; then
break
fi
done
mo=${MM[@]}
echo '{"items":['$mo']}' | "$PARSER" .
else
a='{"title":"'$sts'","arg":"'$sts1'","subtitle":"'$myQuery'"}'
echo '{"items":['$a']}' | "$PARSER" .
fi
fi
fi
fi
post_notion.py
import requests
from pprint import pprint
import json
from bs4 import BeautifulSoup
from urllib.parse import urlencode
from urllib.request import urlopen, Request
import re
import sys
import os
import os.path
import datetime
notion_api_key = os.environ["NOTION_API_KEY"] or ""
notion_database_url = os.environ["NOTION_DATABASE_URL"] or ""
deepl_auth_key = os.environ["DEEPL_AUTH_KEY"] or ""
def longman(spell):
spell=word.replace(" ","%20")
if onlyAlphabet(spell):
spell = spell.lower()
url = "https://www.ldoceonline.com/jp/dictionary/english-japanese/" + spell
else:
url = "https://www.ldoceonline.com/jp/dictionary/japanese-english/" + spell
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
}
source = requests.get(url, headers=headers)
data = BeautifulSoup(source.content, "html.parser")
explanation_list = []
if data.select(".lejEntry"):
if data.select(".Translation"):
for i in range(len(data.select(".Translation"))):
if data.select(".Translation")[i].select(".BOXTRAN.TRAN"):
continue
if data.select(".Translation")[i].select(".PRETRANCOM") or data.select(".Translation")[i].select(".COLL"):
explanation_list.append(
data.select(".Translation")[i].get_text())
continue
tmp = ""
if data.select(".Translation")[i].select(".TRAN"):
for j in range(len(data.select(".Translation")[i].select(".TRAN"))):
tmp += data.select(".Translation")[
i].select(".TRAN")[j].get_text()
if tmp:
explanation_list.append(tmp)
if data.select(".ljeEntry"):
if data.select(".Subentry"):
for i in range(len(data.select(".Subentry"))):
t = ""
if i == 0:
t += data.select(".HWD")[0].get_text()
t += data.select(".Subentry")[i].get_text()
explanation_list.append(t)
tao = explanation_list
result = ""
for idx, txt in enumerate(tao):
if idx < len(tao)-1:
tmp = tao[idx]
result += tmp.strip()+", "
else:
tmp = tao[idx]
result += tmp.strip()
result1 = []
for txt in tao:
result1.append(txt.strip())
if len(explanation_list) == 0:
return "(error) this word is not found"
return result1
def eijiro(word):
spell=word.replace(" ","%20")
if onlyAlphabet(spell[0]) or onlyAlphabet(spell[-1]):
spell = spell.lower()
url = "https://eow.alc.co.jp/search?q=" + spell
else:
url = "https://eow.alc.co.jp/search?q=" + spell
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
}
source = requests.get(url, headers=headers)
data = BeautifulSoup(source.content, "html.parser")
explanation_list = []
if data.select("#resultsList"):
if data.select("#resultsList")[0].find("ul"):
if data.select("#resultsList")[0].find("ul").find("li").find_all("div"):
for i in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div"))):
if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"):
for j in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"))):
if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"):
for k in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"))):
explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li")[k].get_text())
else:
explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].get_text())
tao = explanation_list
result1 = []
for txt in tao:
result1.append(txt.strip())
if len(explanation_list) == 0:
return "(error) this word is not found"
return result1
def deepLMeaning(word, onlyAlphabetFlag):
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
if onlyAlphabetFlag:
targetLang = "JA"
else:
targetLang = "EN"
data = {
"auth_key": deepl_auth_key,
"text": word,
"target_lang": targetLang
}
result = requests.post(
'https://api-free.deepl.com/v2/translate', headers=headers, data=data)
result = result.json()
if result["translations"]:
return result["translations"][0]["text"]
meaningList = []
links = []
useDeepL=[]
def main(word, onlyAlphabetFlag, link):
if "https://www.ldoceonline.com/jp/dictionary" in link:
useLongman = longman(word)
for idx, i in enumerate(useLongman):
if onlyAlphabetFlag:
ken = i.split("• ")
else:
ken = i.split("‣")
if len(ken) > 1:
for idx, k in enumerate(ken):
if onlyAlphabetFlag:
if idx == 0:
meaningList.append(ken[0])
else:
if " → See English-Japanese Dictionary" in k:
k = k.split(
" → See English-Japanese Dictionary")[0]
if idx == 0:
meaningList.append(ken[0])
else:
meaningList.append(ken[0])
elif "https://eow.alc.co.jp/search" in link:
useEijiro = eijiro(word)
if useEijiro != "(error) this word is not found":
spell=word.replace(" ","%20")
if onlyAlphabet(spell):
spell = spell.lower()
url = "https://eow.alc.co.jp/search?q=" + spell
else:
url = "https://eow.alc.co.jp/search?q=" + spell
links.append(url)
for idx, i in enumerate(useEijiro):
if ":" in i:
ken = i.split("・")
else:
ken = [i]
meaningList.append(ken[0])
else:
useDeepL.append(1)
meaningList.append(deepLMeaning(word, onlyAlphabetFlag))
def onlyAlphabet(text):
re_roman = re.compile(r'^[a-zA-Z\.]+$')
return re_roman.fullmatch(text[0])
def get_request_url(end_point):
return f'https://api.notion.com/v1/{end_point}'
if "https://www.ldoceonline.com/jp/dictionary" in sys.argv[1:][0]:
link = sys.argv[1:][0]
word = link[59:].replace("%20"," ")
elif "https://eow.alc.co.jp/search" in sys.argv[1:][0]:
link = sys.argv[1:][0]
word = link[31:].replace("%20"," ")
else:
link = "なし"
word = " ".join(sys.argv[1:])
onlyAlphabet_ = onlyAlphabet(sys.argv[1:][0][0]) or onlyAlphabet(sys.argv[1:][0][-1])
main(word, onlyAlphabet_, link)
if len(useDeepL)==0:
if onlyAlphabet(word[0]) or onlyAlphabet(word[-1]):
word = word.lower()
if links:
link=links[0]
headers_for_notion = {"Authorization": f"Bearer {notion_api_key}",
"Content-Type": "application/json",
"Notion-Version": "2021-05-13"}
databases_ids = [notion_database_url]
databases_id = databases_ids[0][22:][:databases_ids[0][22:].find('?')]
response = requests.request('GET', url=get_request_url(
f'databases/{databases_id}'), headers=headers_for_notion)
headers = {
"accept": "application/json",
"Content-Type": "application/x-www-form-urlencoded"
}
now = datetime.datetime.now()
if len(meaningList) >= 7:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
"Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
"Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
"Meaning5": {"rich_text": [{"text": {"content": meaningList[4]}}]},
"Meaning6": {"rich_text": [{"text": {"content": meaningList[5]}}]},
"Meaning7": {"rich_text": [{"text": {"content": ",".join(meaningList[6:])}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
elif len(meaningList) == 6:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
"Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
"Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
"Meaning5": {"rich_text": [{"text": {"content": meaningList[4]}}]},
"Meaning6": {"rich_text": [{"text": {"content": meaningList[5]}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
elif len(meaningList) == 5:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
"Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
"Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
"Meaning5": {"rich_text": [{"text": {"content": meaningList[4]}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
elif len(meaningList) == 4:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
"Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
"Meaning4": {"rich_text": [{"text": {"content": meaningList[3]}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
elif len(meaningList) == 3:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
"Meaning3": {"rich_text": [{"text": {"content": meaningList[2]}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
elif len(meaningList) == 2:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Meaning2": {"rich_text": [{"text": {"content": meaningList[1]}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
elif len(meaningList) == 1:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Meaning1": {"rich_text": [{"text": {"content": meaningList[0]}}]},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
else:
body_for_notion = {
"parent": {
"database_id": databases_id
},
"properties": {
"Word": {"title": [{"text": {"content": word}}]},
"Link": {"url": link},
"Date": {"rich_text": [{"text": {"content": now.strftime('%Y/%m/%d %H:%M:%S')}}]},
}
}
res = requests.request('POST', url=get_request_url(
'pages'), headers=headers_for_notion, data=json.dumps(body_for_notion))
if res.status_code == 200:
print("データベースに", word, "を追加しました!")
else:
print("失敗しました...")
scrayping_eijiro.py
# coding: utf-8
import requests
from bs4 import BeautifulSoup
import sys
import json
import re
def main(spell):
spell=spell.replace(" ","%20")
if onlyAlphabet(spell[0]) or onlyAlphabet(spell[-1]):
spell = spell.lower()
url = "https://eow.alc.co.jp/search?q=" + spell
else:
url = "https://eow.alc.co.jp/search?q=" + spell
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
}
source = requests.get(url, headers=headers)
data = BeautifulSoup(source.content, "html.parser")
explanation_list = []
if data.select("#resultsList"):
if data.select("#resultsList")[0].find("ul"):
if data.select("#resultsList")[0].find("ul").find("li").find_all("div"):
for i in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div"))):
if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"):
for j in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol"))):
if data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"):
for k in range(len(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li"))):
explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].find_all("li")[k].get_text())
else:
explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].find_all("ol")[j].get_text())
else:
explanation_list.append(data.select("#resultsList")[0].find("ul").find("li").find_all("div")[i].get_text())
tao = explanation_list
result1 = []
for txt in tao:
result1.append(txt.strip())
if len(explanation_list) == 0:
return "(error) this word is not found"
return result1
def onlyAlphabet(text):
re_roman = re.compile(r'^[a-zA-Z\.]+$')
return re_roman.fullmatch(text)
def onlyJa(text):
re_ja = re.compile(
r'^[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}$')
return re_ja.fullmatch(text)
if __name__ == '__main__':
spell = " ".join(sys.argv[1:]).strip()
out = main(spell)
obj = []
if out == "(error) this word is not found":
tao = {
'title': "error",
'subtitle': "(error) This word is not found",
'arg': "error"
}
obj.append(tao)
else:
for idx, i in enumerate(out):
if "◆" in i:
i=i[:i.find("◆")]
if ":" in i:
if i.find('・')>0:
tmp = i.split("・")
ken = []
tao = ""
for k in range(len(tmp)):
if onlyAlphabet(tmp[k][0]):
ken.append(tao[:-1])
tao=tmp[k]
tao+="・"
else:
tao+=tmp[k]
tao+="・"
if tao != "":
ken.append(tao)
else:
ken = [i]
else:
ken = [i]
if len(ken) > 1:
for idx, k in enumerate(ken):
if onlyAlphabet(spell.split()[0]):
if idx == 0:
tao = {
'title': ken[0],
'arg': k
}
else:
u = k.split(":")
reibun_j = ""
reibun_e = ""
if u[0]:
for now in u:
if now:
if onlyAlphabet(now[0]) or onlyAlphabet(now[-1]):
reibun_e += now
reibun_e += " "
else:
reibun_j += now
tao = {
'title': " ‣ "+reibun_e,
'subtitle': " "+reibun_j,
'arg': k
}
else:
continue
obj.append(tao)
else:
print("アルファベットじゃない")
else:
tao = {
'title': ken[0],
'arg': ken[0]
}
obj.append(tao)
jso = {'items': obj}
sys.stdout.write(json.dumps(jso, ensure_ascii=False))
scrapying_longman.py
# coding: utf-8
import requests
from bs4 import BeautifulSoup
import sys
import json
import re
def main(spell):
spell=spell.replace(" ","-")
if onlyAlphabet(spell[0]) or onlyAlphabet(spell[-1]):
spell = spell.lower()
url = "https://www.ldoceonline.com/jp/dictionary/english-japanese/" + spell
else:
url = "https://www.ldoceonline.com/jp/dictionary/japanese-english/" + spell
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
}
source = requests.get(url, headers=headers)
data = BeautifulSoup(source.content, "html.parser")
explanation_list = []
if data.select(".lejEntry"):
if data.select(".Translation"):
for i in range(len(data.select(".Translation"))):
if data.select(".Translation")[i].select(".BOXTRAN.TRAN"):
continue
if data.select(".Translation")[i].select(".PRETRANCOM") or data.select(".Translation")[i].select(".COLL"):
explanation_list.append(
data.select(".Translation")[i].get_text())
continue
tmp = ""
if data.select(".Translation")[i].select(".TRAN"):
for j in range(len(data.select(".Translation")[i].select(".TRAN"))):
tmp += data.select(".Translation")[
i].select(".TRAN")[j].get_text()
if tmp:
explanation_list.append(tmp)
if data.select(".ljeEntry"):
if data.select(".Subentry"):
for i in range(len(data.select(".Subentry"))):
t = ""
if i == 0:
t += data.select(".HWD")[0].get_text()
t += data.select(".Subentry")[i].get_text()
explanation_list.append(t)
tao = explanation_list
result = ""
for idx, txt in enumerate(tao):
if idx < len(tao)-1:
tmp = tao[idx]
result += tmp.strip()+", "
else:
tmp = tao[idx]
result += tmp.strip()
result1 = []
for txt in tao:
result1.append(txt.strip())
if len(explanation_list) == 0:
return "(error) this word is not found"
return result1
def onlyAlphabet(text):
re_roman = re.compile(r'^[a-zA-Z\.]+$')
return re_roman.fullmatch(text)
def onlyJa(text):
re_ja = re.compile(
r'^[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}[\u4E00-\u9FFF|\u3040-\u309F|\u30A0-\u30FF]{1,10}$')
return re_ja.fullmatch(text)
if __name__ == '__main__':
spell = " ".join(sys.argv[1:]).strip()
out = main(spell)
obj = []
if out == "(error) this word is not found":
tao = {
'title': "error",
'subtitle': "(error) This word is not found",
'arg': "error"
}
obj.append(tao)
else:
for idx, i in enumerate(out):
if onlyAlphabet(spell):
ken = i.split("• ")
else:
ken = i.split("‣")
if len(ken) > 1:
for idx, k in enumerate(ken):
if onlyAlphabet(spell):
if idx == 0:
tao = {
'title': ken[0],
'arg': k
}
else:
u = k.split(" ")
reibun_j = ""
reibun_e = ""
for i in u:
if onlyAlphabet(i):
reibun_e += i
reibun_e += " "
else:
reibun_j += i
tao = {
'title': " ‣ "+reibun_e,
'subtitle': " "+reibun_j,
'arg': k
}
obj.append(tao)
else:
if " → See English-Japanese Dictionary" in k:
k = k.split(
" → See English-Japanese Dictionary")[0]
if idx == 0:
tao = {
'title': ken[0],
}
else:
tao = {
'title': "‣ "+k[2:],
'subtitle': k[2:].split(" 〘")[0],
'arg': k[2:].split(" 〘")[0]
}
obj.append(tao)
else:
tao = {
'title': ken[0],
'arg': ken[0]
}
obj.append(tao)
jso = {'items': obj}
sys.stdout.write(json.dumps(jso, ensure_ascii=False))
ocr.py
import os
import sys
sys.path.append('/usr/local/lib/python3.9/site-packages')
import glob
import json
from PIL import Image
import pyocr
import requests
import random
deepl_auth_key = os.environ["DEEPL_AUTH_KEY"] or ""
screenshot_path = os.environ["SCREENSHOT_PATH"] or ""
pyocr.tesseract.TESSERACT_CMD = os.environ["OCR_PATH"] or r'/usr/local/bin/tesseract'
tools = pyocr.get_available_tools()
tool = tools[0]
list_of_files = glob.glob(screenshot_path)
latest_file = max(list_of_files, key=os.path.getctime)
img = Image.open(latest_file)
builder = pyocr.builders.TextBuilder(tesseract_layout=6)
text = tool.image_to_string(img, lang="eng", builder=builder)
text = text.replace('\"','\\\"')
text = text.replace('\'',"\\'")
text = text.replace('&','%26')
text = text.replace('\n',' ')
deepl_token=deepl_auth_key
source_lang = 'EN'
target_lang = 'JA'
param = {
'auth_key' : deepl_token,
'text' : text,
'source_lang' : source_lang,
"target_lang": target_lang
}
request = requests.post("https://api-free.deepl.com/v2/translate", data=param)
result = request.json()
resultText = result['translations'][0]['text']
resultText.replace('\\"','\"')
resultText.replace(" ","")
resultText.replace('.','。').replace(',','、')
sts=resultText
cnt = len(sts)
CNT=cnt
start=0
tao=[]
subtex=text
subtex=subtex.replace("\\'","\'")
cnt2=len(subtex)
numForTitle=40
if cnt > numForTitle+1:
start=0
tmpStart=start
startForSubtitle=0
MM=[]
numForSubtitle=83
endend=0
subtitleFinish=False
while True:
numForTitle=40
cnt-=numForTitle
cnt2-=numForSubtitle
if cnt > 0:
now = sts[start:start+numForTitle]
else:
now = sts[start:]
if cnt2>0:
endend=numForSubtitle
for i in range(numForSubtitle):
if i==0:
endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
if endbreak == " ":
break
else:
endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
if endbreak == " ":
break
endend-=1
nowForSubtitle=subtex[startForSubtitle:startForSubtitle+endend]
if start == 0:
if cnt2>0:
a={"title":now,"arg":sts,"subtitle":nowForSubtitle}
else:
a={"title":now,"arg":sts,"subtitle":subtex[startForSubtitle:]}
subtitleFinish=True
else:
if cnt>0:
if cnt2>0:
a={"title":now,"arg":now,"subtitle":nowForSubtitle}
else:
a={"title":now,"arg":now,"subtitle":subtex[startForSubtitle:]}
else:
if cnt2>0:
a={"title":now,"arg":now,"subtitle":nowForSubtitle}
else:
if tmpStart == start:
a={"title":"","arg":"","subtitle":subtex[startForSubtitle:]}
else:
if subtitleFinish:
a={"title":now,"arg":now,"subtitle":""}
else:
a={"title":now,"arg":now,"subtitle":subtex[startForSubtitle:]}
startForSubtitle+=endend
tmpStart=start
if tmpStart+numForTitle<CNT:
start=tmpStart+numForTitle
else:
start=tmpStart
tao.append(a)
if cnt < 0 and cnt2 < 0:
break
else:
numForSubtitle=83
startForSubtitle=0
if cnt2>numForSubtitle:
while True:
cnt2 -= numForSubtitle
if cnt2>0:
endend=numForSubtitle
for i in range(numForSubtitle):
if i==0:
endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
if endbreak == " ":
break
else:
endbreak=subtex[startForSubtitle+endend-1:startForSubtitle+endend]
if endbreak==" ":
break
endend-=1
nowForSubtitle=subtex[startForSubtitle:startForSubtitle+endend]
if cnt2>0:
a={"title":sts,"arg":sts,"subtitle":nowForSubtitle}
else:
a={"title":"","arg":"","subtitle":subtex[startForSubtitle:]}
startForSubtitle+=endend
tao.append(a)
if cnt2 < 0:
break
else:
a={"title":sts,"arg":sts,"subtitle":subtex}
tao.append(a)
sys.stdout.write(json.dumps({'items': tao}, ensure_ascii=False))
#os.remove(latest_file) #完全削除
#shutil.move(latest_file,'/Users/kt/.Trash/') #ごみ箱へ移動する場合
最後に
GitHub
Packal
辞書(英辞郎とロングマン英和辞典)と翻訳を一つのコマンドで全部一気に見れるのがよい。
Alfred最高定期。
Notionはもう少し見やすくなってくれ。。。