More than 5 years have passed since last update.

PythonでAzure Text to Speechで合成音声の作成

Last updated at 2019-11-06Posted at 2019-04-10

なに？

以下の環境下でPython(Jupyter)でAzure Text to Speechを使って合成音声の音声ファイルを作るコードです。
Gistとかにでもあげればよいものですが誰かの役に立つかな？と自分のメモ含めQiitaにあげておきます。

Jupyter notebook
Python3
Azure Text to Speechの東日本リージョン

Azureの契約や設定方法は検索したらすぐに出てきます！リージョンの指定を東日本にしておくと下記コードがそのまま動きます

さっそくコード

text = '音声ファイルにしたい日本語文章を入れてください'
subscription_key = 'xxxxxxxxxx' # APIキーをいれてください

import requests
import xml.etree.ElementTree as ElementTree

fetch_token_url = 'https://japaneast.api.cognitive.microsoft.com/sts/v1.0/issuetoken'
headers = {
    'Ocp-Apim-Subscription-Key': subscription_key
}
response = requests.post(fetch_token_url, headers=headers)
access_token = str(response.text)
print(access_token)

constructed_url = 'https://japaneast.tts.speech.microsoft.com/cognitiveservices/v1'

headers = {
    'Authorization': 'Bearer ' + access_token,
    'Content-Type': 'application/ssml+xml',
    'X-Microsoft-OutputFormat': 'audio-16khz-128kbitrate-mono-mp3',
}

xml_body = ElementTree.Element('speak', version='1.0')
xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'ja-JP')
voice = ElementTree.SubElement(xml_body, 'voice')
voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'ja-JP')
voice.set('name', 'Microsoft Server Speech Text to Speech Voice (ja-JP, Ayumi, Apollo)')
prosody = ElementTree.SubElement(voice, 'prosody')
prosody.set('pitch','medium') # high
prosody.set('rate','medium') # fast
prosody.text = text
body = ElementTree.tostring(xml_body)

response = requests.post(constructed_url, headers=headers, data=body)
if response.status_code == 200:
    with open('sample.mp3', 'wb') as audio:
        audio.write(response.content)
        print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
else:
    print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")

基本的にはAzureサイト内にあるリファレンスと一緒ですが、Jupyter用に少し書き換えたのとリージョンを変えているくらいです。
上記のコードをJupyterのセルに入力して実行してくれればsample.mp3という音声ファイルができあがります。

＊注意点＊

1行目の text の文章を変えてください。この文章を読み上げてくれます。
2行目の subscription_key にはAzure Portalから取得できる APIキー（１でも２でも）入れてください。
終盤の voice.set('name', 'Microsoft Server Speech Text to Speech Voice (ja-JP, Ayumi, Apollo)') ここで読み上げの音声を指定しています。下記URL参考に変えていただくと男性の声にもできます。
https://docs.microsoft.com/bs-latn-ba/azure/cognitive-services/speech-service/language-support
同じく終盤の prosody.set('rate','fast') prosody.set('pitch','medium') などを変えていただくとゆっくりしゃべらせたりもできます。下記URLあたりをみると色々と設定がのっています。
https://docs.microsoft.com/bs-latn-ba/azure/cognitive-services/speech-service/speech-synthesis-markup

音声系はいわゆる**SSML(Speech Synthesis Markup Language)**というスキーマですので色々調べるといろいろ出てきます。慣れておくとAmazon EchoんどVUIでも応用可能です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up