More than 3 years have passed since last update.

英語わかんないからAzure Speech Servicesでリアルタイム自動翻訳機を作ってみた

Last updated at 2021-12-03Posted at 2021-05-28

英語がわかんない

英語、わかんないですよね。
海外の人が喋ってる映像が、日本語で再生されたらなあ最高なのに。。。。。。。。

そんなことを思った私はAzureの音声認識サービスであるSpeech Serviciesでリアルタイム自動翻訳機を作ろうと思いました。

Azure Speech Servicies
https://azure.microsoft.com/ja-jp/services/cognitive-services/speech-services/

参考元

Microsoftの公式ドキュメントに、以下のようなSpeech Serviciesクイックスタートが載っています。

音声→テキスト変換のやりかた

翻訳のやりかた

テキスト→音声変換のやりかた

これらを組み合わせてちょっと工夫すれば、リアルタイム自動翻訳機が作れるのでは！？
継続して音声を認識し、翻訳して読み続けるような機構をつくってみます。

開発環境

Windows10 64bit
python 3.7 (3.8以降だと音声ライブラリのpyaudioがうまく動きませんでした)

処理

メインスレッドは常に音声入力を受け付けるように記述し、

音声→テキスト変換
翻訳
テキスト→音声変換

の順で、
状態を遷移させながらサブスレッドがイベントをキャッチし、処理を行っています。

import time
import azure.cognitiveservices.speech as speechsdk
import pyaudio

speech_key, service_region = "--------", "---------" #認証キーとリージョンは各自で設定してください
# 英語音声→日本語音声
from_language, to_language, voince_name = 'en-US', 'ja', 'ja-JP-HarukaRUS'
# 日本語音声→英語音声
# from_language, to_language, voince_name = 'ja-JP', 'en', 'en-US-ZiraRUS'

# - 音声サービスの言語と音声のサポート(voince_name　リスト掲載ドキュメント）
# https://docs.microsoft.com/ja-jp/azure/cognitive-services/speech-service/language-support#standard-voices

def translation_continuous():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
        subscription=speech_key, region=service_region,
        speech_recognition_language=from_language,
        target_languages=[to_language], voice_name=voince_name)

    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

    recognizer = speechsdk.translation.TranslationRecognizer(
        translation_config=translation_config, audio_config=audio_config)

    done = False

    recognizer.session_started.connect(lambda evt: print('SESSION STARTED {}'.format(evt)))
    recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    recognizer.recognized.connect(lambda evt: print('RECOGNIZED {}'.format(evt.result.translations[to_language])))

    def synthesis_callback(evt):

        p = pyaudio.PyAudio()
        out_stream = p.open(format=p.get_format_from_width(2), channels=1, rate=16000,
                            output=True)  # フォーマット:16bit, チャンネル:モノラル, サンプリングレート:16000
        size = len(evt.result.audio)
        print('AUDIO SYNTHESIZED: {} byte(s) {}'.format(size, '(COMPLETED)' if size == 0 else ''))
        if size > 0:
            print('PLAYING AUDIO')
            out_stream.write(evt.result.audio)

    # connect callback to the synthesis event
    recognizer.synthesizing.connect(synthesis_callback)

    # start translation
    recognizer.start_continuous_recognition()

    while not done:
        time.sleep(5)

translation_continuous()

結果

えっ、使えるかも。というものが出来上がりますよ

ちなみにこの記事はタイトル以外Servicesのつづりが間違っています。
気づきましたか？
英語はむずかしいですね。
以上です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up