More than 1 year has passed since last update.

GPT-4oがスゴすぎるが敢えてここは「Text to speech with the Azure OpenAI Service」を紹介しよう！

Last updated at 2024-05-17Posted at 2024-05-17

はじめに

GPT-4o凄すぎますね。
本職は驚き屋さんではないですが、圧巻です。

これはとんでもないことになりました。

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN

Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
— OpenAI (@OpenAI) May 13, 2024

Xのデモのなかでも特にコミュニケーションが充実していることが印象的でした。

実は最近、Text to speech with the Azure OpenAI Serviceを見かけて、Power Apps & Power Automateで試していたところなんですよね。

このGPT 4o凄すぎムードの中で、めっちゃText to speech with the Azure OpenAI Serviceよさげ！ という記事を書いていきたいと思います！

Azure OpenAIのText to Speech、読み上げが本当にナチュラル!!
日本語バージョンもいつか出るんだろうな。

novaさんのスピーチがGoodです！https://t.co/DVbPzqhrRP #Azure #AzureOpenAI #tts #PowerApps pic.twitter.com/e4ySfZuPsT
— 出戻りガツオ🐟 Microsoft MVP (@DemodoriGatsuo) May 12, 2024

聞き比べ用にGitHub pagesもデプロイしました！

Azure OpenAI ServiceによるText to speech

AzureにはSpeech serviceにText to speechが存在します。

Edgeを使っている方であれば、音声読み上げ機能がなじみ深いと思います。
Microsoft Learn Blogの記事を含めて、音声読み上げ機能は、いたるところで使われています。

今回ご紹介する機能、Text to speech with the Azure OpenAI Serviceは、
文字通りAzure OpenAI Serviceの一部であって、出どころが異なります。

プレビュー機能であり、現在対応している地域は下記のみです。

regions	地域
North Central US	米国東部 2
Sweden Central	スウェーデン中部

0. 事前準備

ほぼつい最近かいたWhisperのリソースのデプロイと一緒です。

対象の地域には、Azure OpenAI Studioの[ モデル ]にttsとtts-hdが表示されます。

■ 1. Power Automate・・・！

今回はAzure Key Vaultもフローに含めています。

PowerApps (V2) トリガーでモデルと読み上げる文字列を、Power Appsから受け取る
Azure Key Vaultに格納したエンドポイントとAPIキーを取得
作成アクションでURIを設定
Text to speech with the Azure OpenAI ServiceにHTTP要求を送信
音声データをPower Appsに返す
(5)と並列でSharePoint上にmpegファイルを保存

■ APIキーと、URIに必要なエンドポイントは[ キーとエンドポイント ]から取得

HTTP要求

URIは下記のように決定されます。

URI

@{outputs('AZURE-OPENAI-ENDPOINT')?['body/value']}/openai/deployments/@{outputs('TTS-DEPLOYMENTNAME')?['body/value']}/audio/speech?api-version=2024-02-15-preview

URI

https://{リソースのエンドポイント}/openai/deployments/{モデルのデプロイ名}/audio/transcriptions?api-version=2024-02-01

Method : POST

Header :

キー	値
api-key	Azure Key Vaultコネクタから呼び出すAPIキー
Content-Type	application/json

Header

{
 "api-key" : "@{outputs('AZURE-OPENAI-APIKEY')?['body/value']}",
 "Content-Type" : "application/json"
}

Body

{
  "model": "@{outputs('TTS-DEPLOYMENTNAME')?['body/value']}",
  "input": "@{triggerBody()?['text']}",
  "voice": "@{triggerBody()?['text_1']}"
}

■ Power Appsに戻す値

Power Appsに値を返す

@{body('HTTP')?['$content']}

■ SharePointでファイルのコンテンツを作成

ファイルコンテンツにHTTPのBodyを設定します。

Power Appsに値を返す

@{body('HTTP')}

■ 2. Power Apps

下記のアプリで音声を聞いてみます。

コンボボックスに、モデルが設定され、テキスト入力に文字が入った状態で、
ボタンコントロールを押し、Power Automateを起動します。

OnSelect

UpdateContext({AudioData: TextToSpeechOpenAI.Run(TextInput.Value, cmbVoice.Selected.Value).response});
UpdateContext({ResetStart: false});
UpdateContext({ResetStart: true});

Audioコントロールに、

AudioData

"data:audio/mpeg;base64," & AudioData

こちらで音声が再生されます。

AudioコントロールのAutoStartをtrueに設定し、
Resetに、ResetStartを設定すると、値が戻ってきたタイミングで読み上げられます！

まるで人が読んでいるような感覚！

聞き比べ用のGitHub pagesでぜひ聞いてみてください！

Voice	所感
alloy	中性的な印象
echo	若い男性の印象
fable	若い男性の印象2
onyx	雰囲気のある男性の印象
nova	明るい女性の声
shimmer	ベテランな感じの女性の声

個人の感想です

ちなみにAIを使うたびに、抑揚が異なります！
Long speechは、GPT-4oで作ったスピーチを読みあげてもらっています！

面白すぎて止まらない～～！！
Azure OpenAIのGPT-4oのAPIも待ちきれません！！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up