More than 5 years have passed since last update.

Raspberry Piを喋らせよう

Last updated at 2019-05-05Posted at 2019-05-03

Google Cloud APIを利用すると簡単にRaspberry PIに日本語音声合成機能を追加することができる。

準備インストールAPI

sudo  pip install google-cloud-texttospeech

サポートされている声の種類は、ja-JP-Wavenet-A,ja-JP-Wavenet-B,ja-JP-Wavenet-C,ja-JP-Wavenet-Dの4種類です。

tts1.py

# !/usr/local/bin/python3.6
from google.cloud import texttospeech as tts
import sys,subprocess as proc
client = tts.TextToSpeechClient()
audio_config = tts.types.AudioConfig(audio_encoding=tts.enums.AudioEncoding.LINEAR16)
text="この釘はひきぬきにくい釘だ"
itext = tts.types.SynthesisInput(text=text)
for nm in 'ABCD':
    print("ja-JP-Wavenet-"+nm)
    voice = tts.types.VoiceSelectionParams(language_code='ja-JP',name='ja-JP-Wavenet-'+nm)
    resp = client.synthesize_speech(itext, voice, audio_config)
    with open('temp.wav', 'wb') as out:
        out.write(resp.audio_content)
    proc.call("aplay -q temp.wav", shell=True)

サンプル音声

プログラム作成中 pic.twitter.com/EQmUNMteyV
— utaca.rich (@RichUtaka) May 3, 2019

追加の設定 1.speaking_rate -[0.25、4.0]の範囲内のオプションの発話速度/速度。1.0は、特定の音声でサポートされている通常のネイティブ速度です。2.0は2倍、0.5は半分です。未設定（0.0）の場合、デフォルトはネイティブの1.0の速度になります。0.25未満または4.0より大きい他の値はエラーを返します。 2.pitch [-20.0、20.0]の範囲内のオプションの発声ピッチ。20は、元のピッチから20半音増加することを意味します。-20は、元のピッチから20半音を減らすことを意味します。

tts2.py

# !/usr/local/bin/python3.6
from google.cloud import texttospeech as tts
import sys,subprocess as proc
import os
client = tts.TextToSpeechClient()
audio_config = tts.types.AudioConfig(audio_encoding=tts.enums.AudioEncoding.LINEAR16,pitch=0.5)
p=0
rate=0.25
nm="C"
text='かえるぴょこぴょこみ ぴょこぴょこ あわせてぴょこぴょこむぴょこぴょ'
while rate<4.0:
    try:
        audio_config = tts.types.AudioConfig(audio_encoding=tts.enums.AudioEncoding.LINEAR16,
                                    pitch=p,speaking_rate=rate)
        rate+=0.25
        print("rate=",rate)
        itext = tts.types.SynthesisInput(text=text)
        voice = tts.types.VoiceSelectionParams(language_code='ja-JP',name='ja-JP-Wavenet-'+nm)
        resp = client.synthesize_speech(itext, voice, audio_config)
        with open(nm+'.wav', 'wb') as out:
            out.write(resp.audio_content)
        proc.call("aplay -q %s.wav"%nm, shell=True)
    except KeyboardInterrupt:
        break

スピーキングレイトを変化させる　例

Google cloud Text to speech 検証中 pic.twitter.com/0Sot02DlET
— utaca.rich (@RichUtaka) May 4, 2019

各国語対応

この例では、英語とインド英語日本語を音声合成します。

tts3.py

# !/usr/local/bin/python3.6
from google.cloud import texttospeech as tts
import os,sys,subprocess as proc
ttsdt=[
    ("en-US",'en-US-Wavenet-A',"Thank you for calling. I'm sorry but I can't answer your call right now. Please leave a message at (the sound of) the tone [after the tone]. I'll get back to you as soon as possible."),
    ("en-IN",'en-IN-Wavenet-A',"Thank you for calling. I'm sorry but I can't answer your call right now. Please leave a message at (the sound of) the tone [after the tone]. I'll get back to you as soon as possible."),
    ("ja-JP",'ja-JP-Wavenet-A',"お電話ありがとうございます。ただ今電話に出ることができません。ピーッという音が鳴りましたら[発信音の後に]、メッセージをお願いします。こちらからすぐにお電話いたします。")
]

for dt in ttsdt:
    client = tts.TextToSpeechClient()
    pitch,rate=0,1.0
    audio_config = tts.types.AudioConfig(audio_encoding=tts.enums.AudioEncoding.LINEAR16,pitch=pitch,speaking_rate=rate)
    itext = tts.types.SynthesisInput(text=dt[2])
    voice = tts.types.VoiceSelectionParams(language_code=dt[0],name=dt[1])
    resp = client.synthesize_speech(itext, voice, audio_config)
    with open('temp.wav', 'wb') as out:
        out.write(resp.audio_content)
    proc.call("aplay -q temp.wav", shell=True)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up