More than 5 years have passed since last update.

Juliusの単語・音素セグメンテーションキットのサンプルをMacで実行する

Julius

Last updated at 2015-04-04Posted at 2014-11-02

概要

大語彙連続音声認識エンジン Juliusを使うと，音声ファイルに含まれる音素の開始・終了時間を自動認識できる（これを音素自動ラベリングと呼ぶ [1]）．
音素自動ラベリングを活用すると例えば，カラオケで見るような「歌唱の進行に合わせて表示されている歌詞の色を変える」アプリケーションが作れる．

Juliusの「単語・音素セグメンテーションキット」を使うと，音素自動ラベリングを手軽に試すことができる．本稿ではセグメンテーションキット付属のreadme通りにサンプルを実行する際の手順を，注意すべき事柄と共にまとめました．

準備

以下の2つのどちらかの手法でJuliusをインストールします．

Juliusを手動インストールする

JuliusをHomebrewを使ってインストールする

手法によってJuliusの設定が異なります．例えばAudio inputのprimary A/D-in driverは，1.の場合はportaudioですが2.の場合にはMacOSX CoreAudioになります．詳しい違いは「実行できるか確認」に載せた結果を見比べてください．

手法1. Juliusを手動インストールする

libjulius を Mac で使ってみた - 凹みTipsを参考に準備をします．

portaudioのインストール

MacにおいてJuliusでの録音再生に必要なライブラリportaudioを入れます．Homebrewを使います．

$ brew install portaudio

Juliusのインストール

Juliusのダウンロード

Julius最新版の右カラムにあるQuick downloadから"Source(tarball)"を選択して解凍する．

Juliusのインストール

必要があれば，./configure --helpでインストールされるディレクトリを確認しておきます．それが終わったらインストールします．

$ ./configure --with-mictype=portaudio
$ make

/usr/local/binにjulius.dSYMという実行ファイルが生成されますので，これを使ってJuliusを扱います．

実行できるか確認

設定を弄らなければ下のような結果となります．

$ julius.dSYM -version
JuliusLib rev.4.3.1 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    :
 -  Compiled by  : gcc -g -O2

Library configuration: version 4.3.1
 - Audio input
    primary A/D-in driver   : libportaudio (PortAudio library (external))
    available drivers       :
    wavefile formats        : RAW and WAV only
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no

手法2. JuliusをHomebrewを使ってインストールする

HomebrewでOSXに音声解析エンジンJuliusを入れる - Qiitaを参考にします．

Juliusのインストール

$ brew tap oame/nlp #2015/04/04現在，リポジトリがありません．
$ brew install julius

（2015/04/04 追記）oame/nlpはなくなってしまいましたが，Homebrew FormulasにJuliusが追加されましたので，単純にbrew install juliusで大丈夫です．このため以下の情報は古く，参考程度にしてください．

実行できるか確認

$ julius -version
JuliusLib rev.4.3.1 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    : WordsInt
 -  Compiled by  : clang -DNDEBUG -O3

Library configuration: version 4.3.1
 - Audio input
    primary A/D-in driver   : coreaudio (MacOSX CoreAudio)
    available drivers       :
    wavefile formats        : RAW and WAV only
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : integer (4 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no

単語・音素セグメンテーションキットのダウンロード

Julius 応用キットから，「単語・音素セグメンテーションキット」をダウンロードして解凍する．
なお，2014年11月3日現在，上のページには「2011.03.03更新」と記されています．

SHA-256形式のチェックサムはこう．

$ shasum -a 256 segmentation-kit-v4.0.tar.gz
6fa27efc69b2b09f30050a3e2d2f3e5bbc64f462e8a4457d2cf87323d45a4e7d  segmentation-kit-v4.0.tar.gz

単語・音素セグメンテーションキットのサンプルを動かす

サンプルコードの誤りを修正する

解凍したフォルダのサンプルディレクトリに入る．

$ cd segmentation-kit-v4.0/sample

readmeの誤りを認識する

sample/00readme-ja.txtを開くと次の文章が記載されているが，一部誤っている．

# 誤り
○sample.trans に基づいてセグメンテーションを行う：

  % ../segment_julius4.pl sample.raw sample.trans

○sample.trans_line に基づいてセグメンテーションを行う：

  % ../segment_julius4.pl sample.raw sample.trans_line

実際は音声ファイルとしてsample.rawではなくsample.wavが入っているので適宜，読み替えて使う．

segment_julius4.plを修正する

sampleディレクトリで上述のコマンドを叩いても失敗する．原因は../segment_julius4.plのpath指定部分に誤りがあるため．このファイルをエディタで編集し，次のように直そう．

#### user configuration

## julius4 executable
# $julius4bin="./bin/julius-4.1.5" # 修正前
$julius4bin="/usr/local/bin/julius.dSYM"; # 手動でインストールした場合
$julius4bin="/usr/local/bin/julius"; # Homebrewを使った場合

## acoustic model
# $hmmdefs="./models/hmmdefs_monof_mix16_gid.binhmm"; # 修正前
$hmmdefs="../models/hmmdefs_monof_mix16_gid.binhmm"; # 階層が間違っていた

音素自動ラベリングを行う

修正が終わったらsampleディレクトリでsegment_julius4.plを実行する．

$ ../segment_julius4.pl sample.wav sample.trans
Array @words missing the @ in argument 1 of push() at ../segment_julius4.pl line 71.
enter filename->............................................................................................................................................................................................................enter filename->1 files processed
Log saved in "sample.trans.log"
Result saved in "sample.trans.align".

Julius 4.3.1によって音素自動ラベリングを行った結果が，sample.trans.alignに出力される．
得られる予定の結果は/sample/resultにもともと入っているので差分を確認してみる．
結果を下に示す．"<"が付いている方がJulius 4.3.1での結果，">"が付いている方が元々の結果（おそらくJulius 4.1.5）となっています．
4.3.1の結果のほうがパースし易い出力になっています．

$ diff sample.trans.align result/sample.trans.align
11c11
< input speechfile: sample.wav
---
> input speechfile: ./sample/sample.wav
14c14
< -- phoneme alignment --
---
> -- state alignment --
17,31c17,61
< [   0   22]  -19.594141  silB
< [  23   31]  -25.734667  ky
< [  32   55]  -20.839853  o:
< [  56   67]  -23.585581  w
< [  68   75]  -27.564865  a
< [  76   87]  -26.436462  i
< [  88   97]  -23.435047  i
< [  98  106]  -23.542698  t
< [ 107  116]  -24.730494  e
< [ 117  126]  -24.588940  N
< [ 127  138]  -24.642313  k
< [ 139  143]  -25.039257  i
< [ 144  149]  -24.957642  d
< [ 150  159]  -24.341919  a
< [ 160  203]  -19.249168  silE
---
> [   0    2]  -18.646193  silB #1
> [   3   21]  -19.418312  silB #2
> [  22   22]  -25.778687  silB #3
> [  23   24]  -26.296310  ky #1
> [  25   27]  -25.316650  ky #2
> [  28   31]  -25.767365  ky #3
> [  32   35]  -22.096512  o: #1
> [  36   54]  -20.393854  o: #2
> [  55   55]  -24.287109  o: #3
> [  56   56]  -24.699951  w #1
> [  57   59]  -23.869141  w #2
> [  60   67]  -23.339951  w #3
> [  68   68]  -26.522949  a #1
> [  69   70]  -26.470642  a #2
> [  71   75]  -28.210987  a #3
> [  76   83]  -27.251038  i #1
> [  84   86]  -25.292196  i #2
> [  87   87]  -23.352539  i #3
> [  88   91]  -22.180389  i #1
> [  92   94]  -23.187662  i #2
> [  95   97]  -25.355307  i #3
> [  98  104]  -22.420271  t #1
> [ 105  105]  -26.627441  t #2
> [ 106  106]  -28.314941  t #3
> [ 107  108]  -26.578857  e #1
> [ 109  115]  -23.518240  e #2
> [ 116  116]  -29.519531  e #3
> [ 117  120]  -26.904724  N #1
> [ 121  124]  -22.301331  N #2
> [ 125  126]  -24.532593  N #3
> [ 127  132]  -23.014811  k #1
> [ 133  134]  -26.071655  k #2
> [ 135  138]  -26.368896  k #3
> [ 139  139]  -26.820312  i #1
> [ 140  141]  -24.574707  i #2
> [ 142  143]  -24.613281  i #3
> [ 144  144]  -23.970703  d #1
> [ 145  146]  -24.219849  d #2
> [ 147  149]  -25.778482  d #3
> [ 150  157]  -23.779327  a #1
> [ 158  158]  -25.658447  a #2
> [ 159  159]  -27.526123  a #3
> [ 160  162]  -27.743002  silE #1
> [ 163  167]  -19.885839  silE #2
> [ 168  203]  -18.452921  silE #3

参考

JuliusとJulian/音素自動ラベリング - Miyazawa’s Pukiwiki 公開版

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up