LoginSignup
0
0

More than 1 year has passed since last update.

オムロン環境センサ(2JCIE-BU)をラズパイで使ってみた。(8)

Posted at

Juliusをサーバとして動作させる。

-moduleオプションをつけてJuliusを起動するとsocket接続したアプリケーションに認識結果をXML形式のデータで出力するようになります。

pi@raspberrypi:~ $ julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor -module


実行ログ
pi@raspberrypi:~ $ julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor -module
STAT: include config: /home/pi/julius/dictation-kit-4.5/am-gmm.jconf
WARNING: m_chkparam: "-lmp" only for N-gram, ignored
WARNING: m_chkparam: "-lmp2" only for N-gram, ignored
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: binhmm-header: variance inversed
Stat: read_binhmm: has inversed variances
Stat: read_binhmm: binary format HMM definition
Stat: read_binhmm: this HMM does not need multipath handling
Stat: init_phmm: defined HMMs:  8443
Stat: init_phmm: loading binary hmmlist
Stat: load_hmmlist_bin: reading hmmlist
Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)
Stat: load_hmmlist_bin: reading pseudo phone set
Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)
Stat: init_phmm: logical names: 21429 in HMMList
Stat: init_phmm: base phones:    43 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: pseudo phones are loaded from binary hmmlist file
Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [/home/pi/julius/dict/sensor.dfa] and [/home/pi/julius/dict/sensor.dict]...
Stat: init_voca: read 8 words
STAT: reading additional forward dfa [/home/pi/julius/dict/sensor.dfa.forward]
STAT: done
STAT: Gram #0 sensor registered
STAT: Gram #0 sensor: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sensor: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sensor: installed
STAT: Gram #0 sensor: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 120+0=120
STAT: coordination check passed
STAT: multi-gram: beam width set to 120 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

Stat: server-client: socket ready as server
///////////////////////////////
///  Module mode ready
///  waiting client at 10500
///////////////////////////////
///  Stat: server-client: connect from 127.0.0.1
STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.6 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    : LibSndFile
 -  Compiled by  : gcc -g -O2 -fPIC
Library configuration: version 4.6
 - Audio input
    primary A/D-in driver   : alsa (Advanced Linux Sound Architecture)
    available drivers       : alsa
    wavefile formats        : various formats by libsndfile ver.1
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no
 - built-in SIMD instruction set for DNN

    NONE AVAILABLE, DNN computation may be too slow!
 - built-in CUDA support: no


------------------------------------------------------------
Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"
    hmmfilename=/home/pi/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm
    hmmmapfilename=/home/pi/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin

 Language Model:
 - LM00 "_default"
    grammar #1:
        dfa  = /home/pi/julius/dict/sensor.dfa
        dict = /home/pi/julius/dict/sensor.dict

 Recognizer:
 - SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
           parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)
    sample frequency = 16000 Hz
       sample period =  625  (1 = 100ns)
         window size =  400 samples (25.0 ms)
         frame shift =  160 samples (10.0 ms)
        pre-emphasis = 0.97
        # filterbank = 24
       cepst. lifter = 22
          raw energy = False
    energy normalize = False
        delta window = 2 frames (20.0 ms) around
         hi freq cut = OFF
         lo freq cut = OFF
     zero mean frame = OFF
           use power = OFF
                 CVN = OFF
                VTLN = OFF

    spectral subtraction = off

 cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames
  initial mean from file = N/A
   beginning data weight = 100.00
 cep. var. normalization = no

     base setup from = Julius defaults

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined
          model type = context dependency handling ON
      training parameter = MFCC_E_N_D_Z
       vector length = 25
    number of stream = 1
         stream info = [0-24]
    cov. matrix type = DIAGC
       duration type = NULLD
    max mixture size = 16 Gaussians
     max length of model = 5 states
     logical base phones = 43
       model skip trans. = not exist, no multi-path handling

 AM Parameters:
        Gaussian pruning = none (full computation)  (-gprune)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use average prob. of same LC)

------------------------------------------------------------
Language Model(s)

[LM00 "_default"] type=grammar

 DFA grammar info:
      4 nodes, 8 arcs, 8 terminal(category) symbols
      category-pair matrix: 56 bytes (896 bytes allocated)

 additional forward DFA grammar info:
      4 nodes, 8 arcs, 8 terminal(category) symbols
      category-pair matrix: 0 bytes (0 bytes allocated)

 Vocabulary Info:
        vocabulary size  = 8 words, 40 models
        average word len = 5.0 models, 15.0 states
       maximum state num = 27 nodes per word
       transparent words = not exist
       words under class = not exist

 Parameters:
   found sp category IDs =

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
     total node num =    120
      root node num =      8
      leaf node num =      8

    (-penalty1) IW penalty1 = +0.0
    (-penalty2) IW penalty2 = +0.0
    (-cmalpha)CM alpha coef = 0.050000

 Search parameters: 
        multi-path handling = no
    (-b) trellis beam width = 120 (-1 or not specified - guessed)
    (-bs)score pruning thres= disabled
    (-n)search candidate num= 1
    (-s)  search stack size = 500
    (-m)    search overflow = after 2000 hypothesis poped
            2nd pass method = searching sentence, generating N-best
    (-b2)  pass2 beam width = 30
    (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
    (-sb)2nd scan beamthres = 80.0 (in logscore)
    (-n)        search till = 1 candidates found
    (-output)    and output = 1 candidates out of above
     IWCD handling:
       1st pass: approximation (use average prob. of same LC)
       2nd pass: loose (apply when hypo. is popped and scanned)
     all possible words will be expanded in 2nd pass
     build_wchmm2() used
     lcdset limited by word-pair constraint
    short pause segmentation = off
    fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

    1st pass input processing = real time, on-the-fly
    1st pass method = 1-best approx. generating indexed trellis
    output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

 Input stream:
                 input type = waveform
               input source = microphone
        device API          = default
              sampling freq. = 16000 Hz
             threaded A/D-in = supported, on
       zero frames stripping = off
             silence cutting = on
                 level thres = 2000 / 32767
             zerocross thres = 60 / sec.
                 head margin = 300 msec.
                 tail margin = 400 msec.
                  chunk size = 1000 samples
           FVAD switch value = -1 (disabled)
        long-term DC removal = off
        level scaling factor = 1.00 (disabled)
          reject short input = off
          reject  long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),
    *************************************************************
    * Cepstral mean normalization for real-time decoding:       *
    * NOTICE: The first input may not be recognized, since      *
    *         no initial mean is available on startup.          *
    *************************************************************

------
### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 94

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 69

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 78

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 115

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 111

socket error, connection closed


コマンドが長いので、以下のようにシェルスクリプトにしておきました。

pi@raspberrypi:~/julius $ vi run.sh
pi@raspberrypi:~/julius $ chmod +x run.sh```

```bash:run.sh
#!/usr/bin/bash

julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor

データを取得するクライアントアプリを作る

Juliusをサーバとして起動すると以下のような表示がされますので、起動したマシンへポート番号10500でアクセスします。

///////////////////////////////
///  Module mode ready
///  waiting client at 10500
///////////////////////////////

今回は、Juliusとクライアントのスクリプトは同じラズパイで起動させるので、IPアドレスはlocalhostにしてsoketを作ります。
Juliusを起動後に以下のサンプルスクリプトを実行します。
Juliusが音声を認識するとサンプルスクリプトへXML形式でデータが送られてくるのでXML ElementTreeを使ってパースします。
認識した言葉はWHYPOのWORDという属性のところに入ってくるので、サンプルスクリプトは、それを抜き出して表示しています。

sensor.py
# -*- coding: utf-8 -*-
import socket
import xml.etree.ElementTree as ET

host = 'localhost'
port = 10500

# connect
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))

while True:
  recv_data = ''
  while (recv_data.find('\n.') == -1):
    recv_data += sock.recv(1024).decode()
  recv_data = recv_data.strip('.\n')
  #print(recv_data)
  root = ET.fromstring(recv_data)

  for i in root.iter('WHYPO'):
    word = i.attrib['WORD']
    cm = i.attrib['CM']
    if word != '[s]' and word != '[/s]':
      print('WORD = ' + word + '  : CM = ' + cm )

次回は、WORDで受け取った項目をセンサーのデータを取り出し、OpenJTalkで喋らせる予定。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0