More than 5 years have passed since last update.

Juliusを使ってPepperに自分の発言を復唱させる

Last updated at 2017-01-31Posted at 2017-01-30

　Pepper の外部マシンで音声認識エンジン Julius を使って人間の発言をテキスト化し、それを Pepper に喋らせることをしてみました。コンセプトとしては、自分と Pepper は離れたところにいて、Pepper に自分の身代わりとして喋ってもらう感じです。Julius の精度がイマイチなのであまり正確に復唱はしてくれませんが...笑

環境

Pepper
NAOqi 2.4.3
Julius を動かすマシン
Windows 10 でも動作確認できていますが、基本的に CentOS 7 と Ubuntu 16 前提の話で進めます。(macOS の最新版 Sierra は NAOqi の Python SDK がうまく動作しません)

手順

1. Julius ディクテーションキットの動作

　Julius 本体をインストールしてもいいんですが、音響モデル・単語辞書・言語モデルを自作するのは大変なので、ディクテーションキットでささっと動かしちゃいます。

GitHub からディクテーションキットをダウンロードします。左上の Branch ボタンから v4.4 を選びました。(master はなんとなく気持ち悪い)
圧縮ファイルを展開して、dictation-kit-v4.4 内のrun-linux-dnn.sh などのシェルスクリプトを実行します。(Windows ならバッチファイル run-win-dnn.bat など)

これで音声認識を試せるはずです。ちなみに僕の環境の Linux では圧縮ファイルを展開後 dictation-kit-v4.4/model/phone_m/ 内の音響モデルが変なことになってしまうので、Windows や Mac で展開した同一フォルダと置き換えました。

2. Python2.7 と NAOqi Python SDK のインストール

　Aldebaran のサイトに手順が載ってます。Linux には Python2.7 がデフォルトでインストールされているはずです。確認するには、python --version をターミナルに打ってみてください。Windows の場合はインストールが必要です。NAOqi Python SDK 使用する Pepper の NAOqi のバージョンに合わせて 2.4.3 をダウンロード、インストールしました。

3. Python プログラムの作成

　Pepper にテキストを送信するには、Julius の出力結果を他のプログラムで受け取って、API を利用して Pepper を喋らせる必要があります。そのプログラムを作成します。

Julius の出力結果を他のプログラムで受け取るには、Julius をモジュールモードで起動させる必要があります。これはシェルスクリプト実行時にオプションで -module をつければOKです。
Julius の出力はテキスト化された音声認識の結果を含む XML 形式であり、そこから Pepper に発話させるテキストのみを取り出し、Pepper に送信する必要があります。このプログラムを作ります。

作成したプログラム julius2pepper.py は以下です。(2017/1/31 バグがあったので直しました)

julius2pepper.py

#
# Simple client program for Julius module mode
#

from __future__ import print_function
from contextlib import closing
from naoqi import ALProxy
import xml.etree.ElementTree as ET
import socket
import sys

# ---- constants ----
ARGNUM = 4
BUFSIZE = 4096

# ---- check args ---- 
args = sys.argv   #1: server ipaddr  2: server port  3: pepper ipaddr  4: pepper port
argc = len(args) - 1
if (argc != ARGNUM):
    print("Usage: # python filename server_ipaddr server_port pepper_ipaddr pepper_port")
    quit()

# ---- veriables ---- 
jipaddr = args[1]   #server ipaddr
jport = int(args[2])   #host port (default 10500)
pipaddr = args[3]   # pepper ipaddr
pport = int(args[4])   # pepper port
tts = ALProxy("ALTextToSpeech", pipaddr, pport)

# ---- communicate with Julius ----  
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
with closing(sock):
    sock.connect((jipaddr, jport))   #connect to Julius
    print("[connected]")
    # ---- receive and cut at ".\n" ----
    chunk = ""
    count = 0
    while True:
        count += 1
        print("")
        print(count)
        print("[waiting]")
        chunk += sock.recv(BUFSIZE)   # receive string
        print("[recieved]")
        place = chunk.find(".\n")
        if not chunk:   # if string is empty then break
            break
        elif place == -1:   # if string doesn't have ".\n" then continue
        	chunk = ""
        	continue
        else:   # if string has ".\n" then separate
            while chunk:
                place = chunk.find(".\n")
                xml = chunk[0:place]
            	chunk = chunk[place+2:]
            	
            	# ---- parse XML ----
                xml = xml.replace("\"<", "\"")   # to remove <s> or </s> in CLASSID
                xml = xml.replace(">\"", "\"")
                try:
                    root = ET.fromstring(xml)
                    speech = ""   # this will contain a sentence spoken by pepper
                    for str in root.iter("WHYPO"):
                        speech += str.get("WORD")
                except Exception as e:
                    print(e)
                print("[parsing xml]")
                
                # ---- speak pepper ----
                if speech:
                    print("[sending to pepper]: "+speech)
                    tts.say(speech.encode("utf-8"))
                else:
                    print("[no text]")

実行方法はターミナルで、
python julius2pepper.py [server_ipaddr] [server_port] [pepper_ipaddr] [pepper_port]
引数は、

Julius が起動してるマシンの IP アドレス。同一マシンなら 127.0.0.1 でOK。
Julius が起動してるマシンのポート番号。Julius をモジュールモードで起動したら何番化出てくるはず。デフォルトは 10500。
Pepper の IP アドレス。胸のバタンを押すと教えてくれます。
Pepper のポート番号。デフォルトは 9559。

　Windows の場合はコマンドプロンプトや Cygwin でそのまま Python 呼び出しても API が使えないはずなので一工夫必要。(めんどくさいから Linux でやろう笑)

　ちなみに、Julius の起動時にオプション -input adinnet をつけるとネットワーク経由で他のマシンから音声を送れます。送る側は、Julius 本体をちゃんとインストールして、adintool -in mic -out adinnet -server [server_ip] を打ちます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up