OpenAI APIで英会話の先生を召喚する #OpenAI

はじめに

今回やりたいこと : 英会話教師を自作する

この記事では、OpenAI APIを利用して英会話教師を自作する方法について説明します。本記事は、音声入力からテキスト生成、最終的に音声出力に至る一連のプロセスをカバーします。

※今回はとりあえず機能として、音声を受け取って音声を返す機能はカバーされていますが、工夫の余地はまだまだあります。ご了承ください。
※今後追加で開発したタイミングで随時アップデートします。

OpenAI APIの概要

OpenAI APIは、自然言語処理やテキスト生成、音声認識など、多岐にわたるAI機能を提供します。

参考 : OpenAI Platform

TTS (Text-to-Speech), Text Generationについて

TTSはテキストを音声に変換する技術です。一方、テキスト生成は与えられたプロンプトや文脈に基づいて新しいテキストを生成します。このプロジェクトでは、両方の技術を利用します。

OpenAI APIの環境構築でハマったこと

openaiのバージョンは1.2.0以上

今回の記事で使用しているAPIの使用には、openaiライブラリのバージョン1.2.0以上が必要です。以下のコードがバージョン1.2.0にしないと動きません。

import_openai.py

from openai import OpenAI

音声を入力して文字起こしをする

ここではまだ、OpenAIのAPIは使用していません。
まずは、自分がしゃべる音声を文字列として変数に格納しています。

参考 : 【Python】SpeechRecognitionでマイク音声を録音・文字起こし

text_generation.py

# Import lib
import speech_recognition as sr

# 無音になってから2秒で録音をストップ
PAUSE_THRESHOLD = 2

# メイン関数
def main():
    # 録音用ライブラリのインスタンス化
    recognizer = sr.Recognizer()
    ## 無音になってからN秒経つまで録音を止めない
    recognizer.pause_threshold = PAUSE_THRESHOLD

    try:
        with sr.Microphone() as source:
            print("Please speak something")
            ## 録音開始
            audio = recognizer.listen(source)
            ## 文字起こし
            text = recognizer.recognize_google(audio, language="en-US")
            print("Finish recording!!")

            return text

    except Exception as e:
        print(e)  # Print any exceptions that occur during recording```

話した文章から、英会話教師の立場から返答を生成

OpenAIのAPIを使用しています。
簡単messageをいじったので説明します。

messagesの引数に対して、systemのパラメータを指定して、今回のGPTが果たす役割を明記しています。ここでは、英会話の先生として簡潔に返答することを指示しています。

また、こちらが話したテキストは、userのパラメータを指定して、入力しています。実際に出力を確認しましたが、英会話の先生っぽい出力が返ってきていたので(英会話の先生と話したことないですが。。。)、baselineとしては十分だと思っています。ここはまだまだ改良の余地がありそうなので、時間を見つけて改善していきたいです。

参考 : Text generation models

text_generation.py

# Import libs
from openai import OpenAI
from pathlib import Path
import src.env as env

def main(text):
  # Define OpenAI client with API key
  client = OpenAI(
    api_key=env.KEY,
    )

  # Create a chat completion with the gpt-3.5-turbo model
  response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": "You are a professional English teacher. Please respond concisely."},
      {"role": "user", "content": text},
    ]
  )

  # Return the content of the first choice from the response
  return response.choices[0].message.content

テキストを音声(mp3 file)に変換

ここもOpenAIのAPIを使っています。公式のdocsのコードをそのまま使用しています。

gen_mp3_file.py

# Import libs
from openai import OpenAI
from pathlib import Path
import src.env as env

# main function
def main(text):
    # define client
    client = OpenAI(
        api_key=env.KEY,
    )
    # Execution
    speech_file_path = Path().resolve().parent / env.OUTPUT_FOLDER / env.OUTPUT_FILE
    response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input=text
    )
    response.stream_to_file(speech_file_path)

MP3 fileを音声として出力する

MP3 Fileを再生するコードです。
~~1日ではOpenAIで喋らせるところまで完結させられるほどdocsを読み込めなかったので~~とりあえず、pygameを使用してmp3 fileを再生しています。

speak.py

import pygame
import src.env as env
from pathlib import Path

def main():
    # Initialize pygame
    pygame.mixer.init()

    # Load the MP3 file
    pygame.mixer.music.load(Path().resolve().parent / env.OUTPUT_FOLDER / env.OUTPUT_FILE)

    # Play the MP3 file
    pygame.mixer.music.play()

さいごに

以下のコードを実行することで、今回のコードは動きます。
API Keyをzshrcに書き込んで使用する方法は、よわよわなためパッとできず、とりあえずenv.pyに書いてますが、これをハードコーディングするのは良くないと思うので、zshrcに書き込みましょう。

main.ipynb

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

# import my functions
from src import (
    recording,
    text_generation,
    gen_mp3_file,
    speak,
)

def main():
    # 音声の読み取り -> str
    input_text = recording.main()

    # 出力する文章の生成 -> str
    print(f"generating conversation with your speech [] {input_text} ]")
    output_text = text_generation.main(input_text)

    # 出力する文章の音声をMP3 Fileで出力 -> .mp3 file
    print(f"generated conversation is [] {output_text} ]")
    gen_mp3_file.main(output_text)

    # 音声として、生成したファイルを出力
    speak.main()

# 実行
main()
>>>Please speak something
>>>Finish recording!!
>>>generating conversation with your speech [] I want to practice I want to practice the trip situation ]
>>>generated conversation is [] Great! To practice the trip situation, start by imagining a specific trip or vacation scenario. Then, describe and discuss it using relevant vocabulary, grammar, and expressions. You can also role-play different travel-related scenarios, such as booking a hotel, asking for directions, or ordering food in a restaurant. Remember to focus on practicing your speaking and listening skills to enhance your fluency. ]

お財布にも優しい。