pi@raspberrypi:~ $ julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor -module

STAT: include config: /home/pi/julius/dictation-kit-4.5/am-gmm.jconf
WARNING: m_chkparam: "-lmp" only for N-gram, ignored
WARNING: m_chkparam: "-lmp2" only for N-gram, ignored
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: binhmm-header: variance inversed
Stat: read_binhmm: has inversed variances
Stat: read_binhmm: binary format HMM definition
Stat: read_binhmm: this HMM does not need multipath handling
Stat: init_phmm: defined HMMs:  8443
Stat: init_phmm: loading binary hmmlist
Stat: load_hmmlist_bin: reading hmmlist
Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)
Stat: load_hmmlist_bin: reading pseudo phone set
Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)
Stat: init_phmm: logical names: 21429 in HMMList
Stat: init_phmm: base phones:    43 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: pseudo phones are loaded from binary hmmlist file
Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
STAT: reading [/home/pi/julius/dict/sensor.dfa] and [/home/pi/julius/dict/sensor.dict]...
Stat: init_voca: read 8 words
STAT: reading additional forward dfa [/home/pi/julius/dict/sensor.dfa.forward]
STAT: done
STAT: Gram #0 sensor registered
STAT: Gram #0 sensor: new grammar loaded, now mash it up for recognition
STAT: Gram #0 sensor: extracting category-pair constraint for the 1st pass
STAT: Gram #0 sensor: installed
STAT: Gram #0 sensor: turn on active
STAT: grammar update completed
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 120+0=120
STAT: coordination check passed
STAT: multi-gram: beam width set to 120 (guess) by lexicon change
STAT: wchmm (re)build completed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

Stat: server-client: socket ready as server
///  Module mode ready
///  waiting client at 10500
///  Stat: server-client: connect from
STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.6 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    : LibSndFile
 -  Compiled by  : gcc -g -O2 -fPIC
Library configuration: version 4.6
 - Audio input
    primary A/D-in driver   : alsa (Advanced Linux Sound Architecture)
    available drivers       : alsa
    wavefile formats        : various formats by libsndfile ver.1
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no
 - built-in SIMD instruction set for DNN

    NONE AVAILABLE, DNN computation may be too slow!
 - built-in CUDA support: no

Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"

 Language Model:
 - LM00 "_default"
    grammar #1:
        dfa  = /home/pi/julius/dict/sensor.dfa
        dict = /home/pi/julius/dict/sensor.dict

 - SR00 "_default" (AM00, LM00)

Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
           parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)
    sample frequency = 16000 Hz
       sample period =  625  (1 = 100ns)
         window size =  400 samples (25.0 ms)
         frame shift =  160 samples (10.0 ms)
        pre-emphasis = 0.97
        # filterbank = 24
       cepst. lifter = 22
          raw energy = False
    energy normalize = False
        delta window = 2 frames (20.0 ms) around
         hi freq cut = OFF
         lo freq cut = OFF
     zero mean frame = OFF
           use power = OFF
                 CVN = OFF
                VTLN = OFF

    spectral subtraction = off

 cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames
  initial mean from file = N/A
   beginning data weight = 100.00
 cep. var. normalization = no

     base setup from = Julius defaults

Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined
          model type = context dependency handling ON
      training parameter = MFCC_E_N_D_Z
       vector length = 25
    number of stream = 1
         stream info = [0-24]
    cov. matrix type = DIAGC
       duration type = NULLD
    max mixture size = 16 Gaussians
     max length of model = 5 states
     logical base phones = 43
       model skip trans. = not exist, no multi-path handling

 AM Parameters:
        Gaussian pruning = none (full computation)  (-gprune)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use average prob. of same LC)

Language Model(s)

[LM00 "_default"] type=grammar

 DFA grammar info:
      4 nodes, 8 arcs, 8 terminal(category) symbols
      category-pair matrix: 56 bytes (896 bytes allocated)

 additional forward DFA grammar info:
      4 nodes, 8 arcs, 8 terminal(category) symbols
      category-pair matrix: 0 bytes (0 bytes allocated)

 Vocabulary Info:
        vocabulary size  = 8 words, 40 models
        average word len = 5.0 models, 15.0 states
       maximum state num = 27 nodes per word
       transparent words = not exist
       words under class = not exist

   found sp category IDs =


[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
     total node num =    120
      root node num =      8
      leaf node num =      8

    (-penalty1) IW penalty1 = +0.0
    (-penalty2) IW penalty2 = +0.0
    (-cmalpha)CM alpha coef = 0.050000

 Search parameters: 
        multi-path handling = no
    (-b) trellis beam width = 120 (-1 or not specified - guessed)
    (-bs)score pruning thres= disabled
    (-n)search candidate num= 1
    (-s)  search stack size = 500
    (-m)    search overflow = after 2000 hypothesis poped
            2nd pass method = searching sentence, generating N-best
    (-b2)  pass2 beam width = 30
    (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
    (-sb)2nd scan beamthres = 80.0 (in logscore)
    (-n)        search till = 1 candidates found
    (-output)    and output = 1 candidates out of above
     IWCD handling:
       1st pass: approximation (use average prob. of same LC)
       2nd pass: loose (apply when hypo. is popped and scanned)
     all possible words will be expanded in 2nd pass
     build_wchmm2() used
     lcdset limited by word-pair constraint
    short pause segmentation = off
    fall back on search fail = off, returns search failure

Decoding algorithm:

    1st pass input processing = real time, on-the-fly
    1st pass method = 1-best approx. generating indexed trellis
    output word confidence measure based on search-time scores


 Input stream:
                 input type = waveform
               input source = microphone
        device API          = default
              sampling freq. = 16000 Hz
             threaded A/D-in = supported, on
       zero frames stripping = off
             silence cutting = on
                 level thres = 2000 / 32767
             zerocross thres = 60 / sec.
                 head margin = 300 msec.
                 tail margin = 400 msec.
                  chunk size = 1000 samples
           FVAD switch value = -1 (disabled)
        long-term DC removal = off
        level scaling factor = 1.00 (disabled)
          reject short input = off
          reject  long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),
    * Cepstral mean normalization for real-time decoding:       *
    * NOTICE: The first input may not be recognized, since      *
    *         no initial mean is available on startup.          *

### read waveform input
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Error: adin_alsa: unable to get pcm info from card control
Warning: adin_alsa: skip output of detailed audio device info
STAT: AD-in thread created
STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 94

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 69

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 78

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 115

STAT: 00 _default: 8 generated, 8 pushed, 4 nodes popped in 111

socket error, connection closed


pi@raspberrypi:~/julius $ vi run.sh
pi@raspberrypi:~/julius $ chmod +x run.sh```


julius -C ~/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -gram ~/julius/dict/sensor



///  Module mode ready
///  waiting client at 10500

Juliusが音声を認識するとサンプルスクリプトへXML形式でデータが送られてくるのでXML ElementTreeを使ってパースします。

# -*- coding: utf-8 -*-
import socket
import xml.etree.ElementTree as ET

host = 'localhost'
port = 10500

# connect
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((host, port))

while True:
  recv_data = ''
  while (recv_data.find('\n.') == -1):
    recv_data += sock.recv(1024).decode()
  recv_data = recv_data.strip('.\n')
  root = ET.fromstring(recv_data)

  for i in root.iter('WHYPO'):
    word = i.attrib['WORD']
    cm = i.attrib['CM']
    if word != '[s]' and word != '[/s]':
      print('WORD = ' + word + '  : CM = ' + cm )



