More than 3 years have passed since last update.

Pythonで英語の音声対話を実装しよう[オフライン]

Last updated at 2020-11-15Posted at 2020-08-16

pocketshinxについて

pocketsphinxは、オフラインでの英語の音声認識を可能にするモジュールです。
pocketsphinx導入方法, 使い方はこちら
環境構築はこのページでも確認しますので、上記サイトは見ずに飛ばしてもらっても大丈夫(?)です

独自辞書を用いた音声認識(speech to text)

環境

ubuntu 18.04
python3

環境構築

gitにサンプルをまとめたので、cloneして使って下さい。
"https://github.com/hir-osechi/pocketsphinx_sample"

git clone https://github.com/hir-osechi/pocketsphinx_sample.git

この中には、pocketshinx,svoxpicoを使うコードが入っているので、
これらがインストールされていなければ以下を実行して下さい.

cd
cd pocketsphinx_sample/
sh setup.sh

svoxpicoの使い方気になれば

何も設定しない場合のpocketshinxは、以下のコードで実装できます。

pocket_test.py

from pocketsphinx import LiveSpeech
for phrase in LiveSpeech():
    print(phrase)

ここから、LiveSpeech()の括弧の中にオプションを追加できます。
独自辞書を使う場合、

lm = False
dic = 作成した独自辞書のpath(.dictファイル)
jsgf = 作成した独自辞書のpath(.gramファイル)

を追加します。
※ lmは、独自辞書を使う場合の合図で、Falseを指定。

独自辞書の作成

dictファイルについて

pocketsphinxには ".dict" という単語辞書があり、数万の単語とその発話記号が記録されています。

例
weather　W EH DH ER
were　W ER
what　W AH T
what(2)　HH W AH T
where　W EH R
where(2)　HH W EH R

その全ての単語が以下のpathのdictファイルに保存されています。
/usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict

デフォルトでは、この数万の辞書の中から認識した単語を探し出しているため、
この単語数を絞ってあげれば、認識する精度は向上します。

gramファイルについて

gramファイルでは、grammer、つまり文法を指定することができます。
たとえば、以下のようなgramファイルを作成させれば、

What food do you like ?
Where do you live in ?

の２文のみしか認識しなくなります。

# JSGF V1.0;
grammar test;
public <rule> = <command>;
<command> = what food do you like | where do you live in;

さて、dict,gramファイルを毎度手書きで作成するのは億劫であるので、
文章を入力して自動でdict,gramファイルを作成するコードを作成しました。
gitに乗せてあります。

cd
cd pocketsphinx_sample/tools
python3 gram_maker_by_input.py

以下のようになるように入力してみて下さい。

作りたい辞書の名前を入力してください:test
文章を入力してください + Enter
(終了する場合はCtrl-C)
===============================================================
do you like apple
i want to play tennis
please tell me the way to the kyoto station
let me know what i can do for you

これで4文のみにしか反応しない音声認識が出来るようになります。
しかし、このままだと、ちょっとしたノイズもこの４つのどれかに割り振られてしまうことがあるので、ノイズを追加します。

cd
cd pocketsphinx_sample/tools
python3 gram_noise_changer.py

以下のようになるように入力してみて下さい。

ノイズを変更したい辞書の名前を入力してください:test
変更したいノイズ１欄のtxtファイル名を入力してください(.txtは含まず)：noise_sample
===============================================================
この辞書のノイズを変更します。
===============================================================
変更終了
===============================================================

何をしているか気になれば、test.gram を覗いてみて下さい。
(noiseは、何もしていないときに認識しやすかった単語を入れています、適宜いじって下さい)

これで下準備は完了です！

実行

以下のコマンドで、先程指定した文章のみを認識することが確認できれば成功です。

cd
cd pocketsphinx_sample/
python3 dic_test.py

音声対話

活用例として、質疑応答ができるプログラムを作成しました。
質問文と回答を「,」で区切って、pocketsphinx_sample/dictionary/QandA/QandA.txtに入っています。

QandA.txtから独自辞書を作成するには、gram_maker_from_txt.pyを実行して下さい。

cd pocketsphinx_sample/tools
python3 gram_maker_from_txt.py

以下のようになるように入力して下さい。

作りたい辞書の名前を入力してください:QA_sample
辞書にしたいtxtファイル名を入力してください(.txtは含まず)：QandA
加えたいノイズ１欄のtxtファイル名を入力してください(.txtは含まず)：noise_sample
辞書化終了

実行すれば、質疑応答を実装できると思います。

cd
cd pocketsphinx_sample/
python3 QA_test.py

実行結果↓


[*] START RECOGNITION
----------------------------------
 are you happy ?
[*] SPEAK : yes
----------------------------------

[*] START RECOGNITION
----------------------------------
 what food do you like ?
[*] SPEAK : I like apples.
----------------------------------

更に精度を上げる場合

認識精度を上げる際、ノイズを厳しく指定すれば良い。
例えば、「what food do you like」が誤認識しやすいようであれば、

what
what food
what food do
what food do you

までをnoise_sample.txt に追加し辞書を作り直せば(gram_maker_from_txt.py)、完全一致しなければ出力しないように出来ます。

pocketsphinxの公式サンプル
"https://pypi.org/project/pocketsphinx/"

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up