More than 3 years have passed since last update.

SeleniumでDeepL翻訳

Last updated at 2022-01-15Posted at 2022-01-15

はじめに

DeepL( https://www.deepl.com/ja/translator) での翻訳を、Seleniumを用いて行います。

例として、ニューサイトから取得した記事（ https://news.google.com/rss?hl=en-US&gl=US&ceid=US:en ）のタイトルの英語の翻訳を行っていきます。前回の記事の続きのような内容ですので、スクレイピング部分は、下の記事を参照してください。

前回の記事：ニュース記事をスクレイピング

翻訳の全体像

Seleniumを用いてDeepL翻訳を行います。Seleniumのインストールは、下の記事などを参考にしてください。
参考：　10分で理解する Selenium - Qiita

DeepLのページをSeleniumを用いて、操作します。DeepLのページに行き、一旦、好きな言葉を翻訳してみてください。私は、前回の記事でとってきた、記事のタイトルを入れてみます。

うまく翻訳ができています。ここで注目すべきは、この時のURLです。

https://www.deepl.com/ja/translator#en/ja/Biden%20says%20he's%20%22not%20sure%22%20about%20voting%20bills'%20future%20after%20Sinema%20reiterates%20opposition%20to%20rule%20change%20-%20CBS%20News

元々のURLは、https://www.deepl.com/ja/translator でした。
つまり、元のURL　＋　#翻訳前の言語/翻訳語の言語/翻訳したい内容　になっていることがわかります。しかし、翻訳したい内容は、%20などの余分な文字が追加されています。

ですので、今回の操作の流れは、次のように表せます。

Google Chromeを立ち上げる
翻訳したい内容を含めたURLにアクセス
翻訳したデータを取得する

翻訳を行うコード

翻訳を行うには、ChromeDriverが必要です。ダウンロードはこちらで行えます。
(→ https://chromedriver.chromium.org/downloads )

下のコードは、英語の"Biden says he's "not sure" about voting bills' future after Sinema reiterates opposition to rule change - CBS News"を日本語に翻訳するコードです。

translate.py

import time
import urllib.parse
import lxml
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# text = 翻訳したい文章
text = "Biden says he's "not sure" about voting bills' future after Sinema reiterates opposition to rule change - CBS News"
text = urllib.parse.quote(text)
# 　URL作成
from_lang = 'en'
to_lang = 'ja'
url = 'https://www.deepl.com/translator#' + from_lang +'/' + to_lang + '/' + text
     
# 　ヘッドレスモードでブラウザを起動
options = Options()
options.add_argument('--headless')
     
# ブラウザーを起動
DRIVER_PATH = "./chromedriver" # 自分のchromedriverまでのパス
driver = webdriver.Chrome(DRIVER_PATH, options=options)
driver.get(url)
driver.implicitly_wait(30)  # 見つからないときは、30秒まで待つ

# DeepLの翻訳部分
sleep_time = 10
try_max_count = 30
for i in range(try_max_count):
    # 指定時間待つ
    time.sleep(sleep_time)  
    html = driver.page_source
    soup = BeautifulSoup(html, features='lxml')
    target_elem = soup.find(class_="lmt__translations_as_text__text_btn")
    translated_text = target_elem.text
    # DeepLの翻訳が終わっている確認
    if translated_text:
        break
             
# ブラウザ停止
driver.quit()

参考文献

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

SeleniumでDeepL翻訳

はじめに

目次

翻訳の全体像

翻訳を行うコード

参考文献