More than 1 year has passed since last update.

Whisper API と COEIROINK でボイスチェンジャーを作ってみた

Last updated at 2023-05-23Posted at 2023-05-23

はじめに

この記事では、Whisper API と COEIROINK でボイスチェンジャーを作る方法について解説します。

事前準備

COEIROINK のダウンロード
OpenAI API key の取得

処理の流れ

マイクから入力された音声を SpeechRecognition で認識します。
Whisper API による音声の解析を行います。
解析した音声を基に、COEIROINK を使用して音声を合成し、再生します。

マイクから入力された音声の認識

Python のモジュールである SpeechRecognition を使用します。
examples/microphone_recognition.pyを参考に実装しました。

import speech_recognition as sr

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    print("Say something!")
    audio = r.listen(source)

Whisper API による音声の解析

recognize_whisper_api というメソッドが用意されていますが、lauguage: ja を指定したかったため、speech_recognition/recognizers/whisper.py を参考に実装しました。

from io import BytesIO

import openai
from speech_recognition.audio import AudioData
from speech_recognition.exceptions import SetupError


def recognize_whisper_api(
    audio_data: "AudioData",
    api_key: str,
    model: str = "whisper-1"
):
    wav_data = BytesIO(audio_data.get_wav_data())
    wav_data.name = "SpeechRecognition_audio.wav"

    transcript = openai.Audio.transcribe(model, wav_data, api_key=api_key, language="ja")
    return transcript["text"]

COEIROINKを使用した音声の合成

COEIROINK の起動後、http://localhost:50031/docs#/ を参考に実装しました。
- /audio_query と /synthesis を使用します。
- 今回は Node.js を使用しました。

import axios from 'axios'
import type { AudioQuery } from '../types'

const coeiroinkHost = 'http://127.0.0.1:50031'
const speaker = 0

export const getAudioQuery = async (text: string): Promise<AudioQuery | undefined> => {
  const audioQuery = axios.post(
    `${coeiroinkHost}/audio_query?text=${encodeURIComponent(text)}&speaker=${speaker}`
    , {
      headers: {
        Accept: 'application/json'
      }
    }
  ).then((res) => {
    return res.data as AudioQuery
  }).catch((err) => {
    console.error(err)
    return undefined
  })

  return await audioQuery
}

export const synthesisVoice = async (audioQuery: AudioQuery): Promise<string | undefined> => {
  const voice = await axios.post(
    `${coeiroinkHost}/synthesis?speaker=${speaker}`, audioQuery
    , {
      responseType: 'arraybuffer',
      headers: {
        'Content-Type': 'application/json',
        accept: 'audio/wav'
      }
    }
  ).then((res) => {
    return res.data as string
  }).catch((err) => {
    console.error(err)
    return undefined
  })

  return voice
}

合成した音声の再生

node-speaker を使用して実装しました。

import { PassThrough } from 'stream'
import Speaker from 'speaker'

export const playAudio = (audio: Buffer, sampleRate: number): void => {
  const speaker = new Speaker({
    channels: 1,
    bitDepth: 16,
    sampleRate
  })
  const bufferStream = new PassThrough()
  bufferStream.end(audio)
  bufferStream.pipe(speaker)
}

デモのスクリーンショット

「おはようございます」と発声したときの例

まとめ

この記事では、Whisper API と COEIROINK でボイスチェンジャーを作る方法について解説しました。
実装したコードは GitHub に置いてありますので、良ければ参考にしてください。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up