More than 3 years have passed since last update.

オムロン環境センサ（2JCIE-BU）をラズパイで使ってみた。（８）

Posted at 2021-10-31

Juliusをサーバとして動作させる。

-moduleオプションをつけてJuliusを起動するとsocket接続したアプリケーションに認識結果をXML形式のデータで出力するようになります。

pi@raspberrypi:~ $ julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor -module

実行ログ

pi@raspberrypi:~ $ julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor -module
STAT: include config: /home/pi/julius/dictation-kit-4.5/am-gmm.jconf
WARNING: m_chkparam: "-lmp" only for N-gram, ignored
WARNING: m_chkparam: "-lmp2" only for N-gram, ignored
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: binhmm-header: variance inversed
Stat: read_binhmm: has inversed variances
Stat: read_binhmm: binary format HMM definition
Stat: read_binhmm: this HMM does not need multipath handling
Stat: init_phmm: defined HMMs:  8443
Stat: init_phmm: loading binary hmmlist
Stat: load_hmmlist_bin: reading hmmlist
Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)
Stat: load_hmmlist_bin: reading pseudo phone set
Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)
Stat: init_phmm: logical names: 21429 in HMMList
Stat: init_phmm: base phones:    43 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: pseudo phones are loaded from binary hmmlist file
Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [/home/pi/julius/dict/sensor.dfa] and [/home/pi/julius/dict/sensor.dict]...
Stat: init_voca: read 8 words
STAT: reading additional forward dfa [/home/pi/julius/dict/sensor.dfa.forward]
STAT: done
STAT: Gram #0 sensor registered
STAT: Gram #0 sensor: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sensor: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sensor: installed
STAT: Gram #0 sensor: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 120+0=120
STAT: coordination check passed
STAT: multi-gram: beam width set to 120 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

Stat: server-client: socket ready as server
///////////////////////////////
///  Module mode ready
///  waiting client at 10500
///////////////////////////////
///  Stat: server-client: connect from 127.0.0.1
STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.6 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    : LibSndFile
 -  Compiled by  : gcc -g -O2 -fPIC
Library configuration: version 4.6
 - Audio input
    primary A/D-in driver   : alsa (Advanced Linux Sound Architecture)
    available drivers       : alsa
    wavefile formats        : various formats by libsndfile ver.1
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no
 - built-in SIMD instruction set for DNN
   
    NONE AVAILABLE, DNN computation may be too slow!
 - built-in CUDA support: no


------------------------------------------------------------
Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"
	hmmfilename=/home/pi/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm
	hmmmapfilename=/home/pi/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin

 Language Model:
 - LM00 "_default"
	grammar #1:
	    dfa  = /home/pi/julius/dict/sensor.dfa
	    dict = /home/pi/julius/dict/sensor.dict

 Recognizer:
 - SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
	       parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)
	sample frequency = 16000 Hz
	   sample period =  625  (1 = 100ns)
	     window size =  400 samples (25.0 ms)
	     frame shift =  160 samples (10.0 ms)
	    pre-emphasis = 0.97
	    # filterbank = 24
	   cepst. lifter = 22
	      raw energy = False
	energy normalize = False
	    delta window = 2 frames (20.0 ms) around
	     hi freq cut = OFF
	     lo freq cut = OFF
	 zero mean frame = OFF
	       use power = OFF
	             CVN = OFF
	            VTLN = OFF

    spectral subtraction = off

 cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames
  initial mean from file = N/A
   beginning data weight = 100.00
 cep. var. normalization = no

	 base setup from = Julius defaults

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined
	      model type = context dependency handling ON
      training parameter = MFCC_E_N_D_Z
	   vector length = 25
	number of stream = 1
	     stream info = [0-24]
	cov. matrix type = DIAGC
	   duration type = NULLD
	max mixture size = 16 Gaussians
     max length of model = 5 states
     logical base phones = 43
       model skip trans. = not exist, no multi-path handling

 AM Parameters:
        Gaussian pruning = none (full computation)  (-gprune)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use average prob. of same LC)

------------------------------------------------------------
Language Model(s)

[LM00 "_default"] type=grammar

 DFA grammar info:
      4 nodes, 8 arcs, 8 terminal(category) symbols
      category-pair matrix: 56 bytes (896 bytes allocated)

 additional forward DFA grammar info:
      4 nodes, 8 arcs, 8 terminal(category) symbols
      category-pair matrix: 0 bytes (0 bytes allocated)

 Vocabulary Info:
        vocabulary size  = 8 words, 40 models
        average word len = 5.0 models, 15.0 states
       maximum state num = 27 nodes per word
       transparent words = not exist
       words under class = not exist

 Parameters:
   found sp category IDs =

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
	 total node num =    120
	  root node num =      8
	  leaf node num =      8

	(-penalty1) IW penalty1 = +0.0
	(-penalty2) IW penalty2 = +0.0
	(-cmalpha)CM alpha coef = 0.050000

 Search parameters: 
	    multi-path handling = no
	(-b) trellis beam width = 120 (-1 or not specified - guessed)
	(-bs)score pruning thres= disabled
	(-n)search candidate num= 1
	(-s)  search stack size = 500
	(-m)    search overflow = after 2000 hypothesis poped
	        2nd pass method = searching sentence, generating N-best
	(-b2)  pass2 beam width = 30
	(-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
	(-sb)2nd scan beamthres = 80.0 (in logscore)
	(-n)        search till = 1 candidates found
	(-output)    and output = 1 candidates out of above
	 IWCD handling:
	   1st pass: approximation (use average prob. of same LC)
	   2nd pass: loose (apply when hypo. is popped and scanned)
	 all possible words will be expanded in 2nd pass
	 build_wchmm2() used
	 lcdset limited by word-pair constraint
	short pause segmentation = off
	fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

	1st pass input processing = real time, on-the-fly
	1st pass method = 1-best approx. generating indexed trellis
	output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

 Input stream:
	             input type = waveform
	           input source = microphone
	    device API          = default
	          sampling freq. = 16000 Hz
	         threaded A/D-in = supported, on
	   zero frames stripping = off
	         silence cutting = on
	             level thres = 2000 / 32767
	         zerocross thres = 60 / sec.
	             head margin = 300 msec.
	             tail margin = 400 msec.
	              chunk size = 1000 samples
	       FVAD switch value = -1 (disabled)
	    long-term DC removal = off
	    level scaling factor = 1.00 (disabled)
	      reject short input = off
	      reject  long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),
	*************************************************************
	* Cepstral mean normalization for real-time decoding:       *
	* NOTICE: The first input may not be recognized, since      *
	*         no initial mean is available on startup.          *
	*************************************************************

------
### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 94

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 69

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 78

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 115

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 111

socket error, connection closed

コマンドが長いので、以下のようにシェルスクリプトにしておきました。

pi@raspberrypi:~/julius $ vi run.sh
pi@raspberrypi:~/julius $ chmod +x run.sh```

```bash:run.sh
# !/usr/bin/bash

julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor

データを取得するクライアントアプリを作る

Juliusをサーバとして起動すると以下のような表示がされますので、起動したマシンへポート番号10500でアクセスします。

///////////////////////////////
///  Module mode ready
///  waiting client at 10500
///////////////////////////////

今回は、Juliusとクライアントのスクリプトは同じラズパイで起動させるので、IPアドレスはlocalhostにしてsoketを作ります。
Juliusを起動後に以下のサンプルスクリプトを実行します。
Juliusが音声を認識するとサンプルスクリプトへXML形式でデータが送られてくるのでXML ElementTreeを使ってパースします。
認識した言葉はWHYPOのWORDという属性のところに入ってくるので、サンプルスクリプトは、それを抜き出して表示しています。

sensor.py

# -*- coding: utf-8 -*-
import socket
import xml.etree.ElementTree as ET

host = 'localhost'
port = 10500

# connect
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))

while True:
  recv_data = ''
  while (recv_data.find('\n.') == -1):
    recv_data += sock.recv(1024).decode()
  recv_data = recv_data.strip('.\n')
  #print(recv_data)
  root = ET.fromstring(recv_data)

  for i in root.iter('WHYPO'):
    word = i.attrib['WORD']
    cm = i.attrib['CM']
    if word != '[s]' and word != '[/s]':
      print('WORD = ' + word + '  : CM = ' + cm )

次回は、WORDで受け取った項目をセンサーのデータを取り出し、OpenJTalkで喋らせる予定。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up