More than 1 year has passed since last update.

ChatGPT✕Raspberry Piで音声対話してみた！

Last updated at 2023-03-03Posted at 2023-03-03

はじめに

ChatGPTのAPIを用いて、手持ちのラズパイで常時雑談をしてくれる相棒を作りたいと思い音声雑談対話システムを作成しました。ラズパイを所有している方はぜひ試してみてください。

用いたもの

Raspberry Pi 4 8GB
マイク(USB)
スピーカー(3.5mm ジャック)
Open JTalk
Julius
ChatGPT(gpt-3.5-turbo)

ラズパイの準備

以下を参考にラズパイにRaspBerry Pi OS(32-bit)をインストールします。
ネットワークの接続まで行ってください。

また、スピーカから音声を出力するための設定を行ってください。

sudo raspi-config

OpenJTalk

音声合成を行うためにOpenJTalkを用います。
まずはOpen Jtalkモジュールのインストールしてください。

sudo apt-get install open-jtalk open-jtalk-mecab-naist-jdic hts-voice-nitech-jp-atr503-m001

次に女性の声のほうが良い方は以下のコマンドでファイルを取得してください。

wget https://ja.osdn.net/projects/sfnet_mmdagent/downloads/MMDAgent_Example/MMDAgent_Example-1.8/MMDAgent_Example-1.8.zip
unzip MMDAgent_Example-1.8.zip
cp -r MMDAgent_Example-1.8/Voice/mei /usr/share/hts-voice/

Julius

音声認識を行うためJuliusを用います。
以下のコマンドで環境構築を行ってください。

mkdir ~/julius
cd ~/julius
wget https://github.com/julius-speech/julius/archive/v4.4.2.1.tar.gz
tar xvzf v4.4.2.1.tar.gz
cd julius-4.4.2.1
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install libasound2-dev libesd0-dev libsndfile1-dev
./configure --with-mictype=alsa
make
sudo make install
cd ~/julius
mkdir julius-kit
cd julius-kit
wget https://osdn.net/dl/julius/dictation-kit-v4.4.zip
unzip dictation-kit-v4.4.zip
cd ~

ChatGPT API

PythonでChatGPTのAPIを使用するため、以下のライブラリをインストールしてください。

pip install openai

また以下の記事を参考にAPI KEYを取得してください。

実行プログラム

以下がシステム全体のプログラムです。
先程取得したAPI KEYをopenai.api_keyに記述してください。

dialog_system.py

import openai
import subprocess
import socket
import re
import time

openai.api_key = "<OpenAI API KEY>"

host = '127.0.0.1'   
port = 10500         
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect((host, port))
time.sleep(3)
re_word = re.compile('WORD="([^"]+)"')

def jtalk(t):
    open_jtalk=['open_jtalk']
    mech=['-x','/var/lib/mecab/dic/open-jtalk/naist-jdic']
    # htsvoice=['-m','/usr/share/hts-voice/mei/mei_bashful.htsvoice'] # 女性の声が良い方はこのコメントを外す
    htsvoice=['-m','/usr/share/hts-voice/nitech-jp-atr503-m001/nitech_jp_atr503_m001.htsvoice']
    speed=['-r','1.0']
    quolity=['-a','0.5']
    toon=['-fm','0.2']
    yokuyo=['-jf','1.0']
    outwav=['-ow','test.wav']
    cmd=open_jtalk+mech+htsvoice+speed+quolity+toon+yokuyo+outwav
    c = subprocess.Popen(cmd,stdin=subprocess.PIPE)
    c.stdin.write(t.encode('utf-8'))
    c.stdin.close()
    c.wait()
    aplay = ['aplay','-q','test.wav','-Dhw:0,0']
    wr = subprocess.Popen(aplay)

    command = b'TERMINATE\n'
    client.sendall(command)
    wr.wait()  # 音声再生が終了するまで待機
    command = b'RESUME\n'
    client.sendall(command)

def completion(new_message_text:str, settings_text:str = '', past_messages:list = []):
    if len(past_messages) == 0 and len(settings_text) != 0:
        system = {"role": "system", "content": settings_text}
        past_messages.append(system)
    new_message = {"role": "user", "content": new_message_text}
    past_messages.append(new_message)

    result = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=past_messages
    )
    response_message = {"role": "assistant", "content": result.choices[0].message.content}
    past_messages.append(response_message)
    response_message_text = result.choices[0].message.content
    return response_message_text, past_messages

def dialog():
    # 以下のsystem_settingsを変更することでキャラを変更できます
    system_settings = """
    あなたはとてもやさしく、話し相手として最高の友人です。
    """
    messages = []
    recog_text = ""
    data = ""
    try:
        while recog_text != "終了。":
            print("音声認識中...")
            while(data.find("</RECOGOUT>\n.") == -1):
                data += str(client.recv(1024).decode('shift_jis'))

            recog_text = "" # 単語を抽出
            for word in filter(bool, re_word.findall(data)):
                recog_text += word

            print("認識結果: " + recog_text)
            if recog_text == "リセット。":
                messages.clear()
                print("messages:",messages)
                recog_text = ""
                data = ""
                jtalk("システムをリセットしたよ。")
                continue

            new_message, messages = completion(recog_text, system_settings, messages)
            print("new_message:",new_message)

            jtalk(new_message)
            data = ""
    except KeyboardInterrupt:
        print('PROCESS END')
        command = b'DIE\n'
        client.send(command)
        client.close()

def main():
    data = ""
    try:
        while True:
            print("音声認識中...")
            while(data.find("</RECOGOUT>\n.") == -1):
                data += str(client.recv(1024).decode('shift_jis'))

            recog_text = "" # 単語を抽出
            for word in filter(bool, re_word.findall(data)):
                recog_text += word

            print("認識結果: " + recog_text)
            if recog_text == "スタート。":
                jtalk("システムを起動したよ。")
                dialog()

            data = ""
    except KeyboardInterrupt:
        print('PROCESS END')
        command = b'DIE\n'
        client.send(command)
        client.close()


if __name__ == '__main__':
    main()

実行

Juliusサーバー起動

julius -C ~/julius/julius-kit/dictation-kit-v4.4/main.jconf -C ~/julius/julius-kit/dictation-kit-v4.4/am-gmm.jconf -nostrip -input mic -module -charconv utf-8 sjis

対話プログラム実行

python dialog_system.py

では、実際に対話をしてみましょう。
ラズパイに接続されているマイクに向かって「スタート」と話しかけてください。対話が始まります。対話中、履歴を消去したい場合は、「リセット」、対話を終わらせたい場合は「終了」と言ってください。終了後、再度対話を始めたい場合は「スタート」と言ってください。

おわりに

いかがでしたでしょうか。今回はラズパイを用いて音声対話システムを作成してみました。簡単な実装なのでぜひ試してみてください。

参考記事

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up