Open AIのAPIで音声合成・TTS (text-to-speech) してみる

Last updated at 2023-12-06Posted at 2023-12-06

ChatGPTなどのGPTモデルを使うのと同じように、同じOpen AIのSDKでTTS（Text to Speech = テキストから音声を生成）を試せます。

ちなみに、逆のSSTはこちら

the quick brown fox jumped over the lazy dogs

というテキストを読み上げてもらいました。mp3などを指定するとその形式で生成されます。

こんな感じです。

tts-1モデル

モデルはtts-1とtts-1-hdが使える模様ですね。

日本語対応

以下の言語が使える模様でJapaneseも入ってますね。

https://platform.openai.com/docs/guides/text-to-speech/supported-languages
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

6人の声が使える

現時点で6人（6種類）の声を使えます。

https://platform.openai.com/docs/guides/text-to-speech/voice-options
Experiment with different voices (alloy, echo, fable, onyx, nova, and shimmer) to find one that matches your desired tone and audience. The current voices are optimized for English.

こちらのコードを参考にしたら動きました。

https://github.com/openai/openai-node/blob/master/examples/audio.ts#L14C1-L16C1

const { OpenAI, toFile } = require('openai');

const fs = require('node:fs/promises');
const path = require('node:path');

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI({
    apiKey: 'Open AIのAPIキー',
});

const speechFile = path.resolve(__dirname, './out.mp3');

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown fox jumped over the lazy dogs',
  });

  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.writeFile(speechFile, buffer);

  const transcription = await openai.audio.transcriptions.create({
    file: await toFile(buffer, 'out.mp3'),
    model: 'whisper-1',
  });
  console.log(transcription.text);

  const translation = await openai.audio.translations.create({
    file: await toFile(buffer, 'out.mp3'),
    model: 'whisper-1',
  });
  console.log(translation.text);
}

main();

日本語でnovaさん

const OpenAI = require('openai');

const fs = require('node:fs/promises');
const path = require('node:path');

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI({
    apiKey: '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',
});

const speechFile = path.resolve(__dirname, './out.mp3');

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'nova',
    input: 'こんにちは、秋葉原の電気街口にいます。',
  });

  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.writeFile(speechFile, buffer);
}

main();

Gyazoの音声をそのまま埋め込めないぽいのでリンクで。

Take1
Take2

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up