More than 3 years have passed since last update.

VOSK test_simple.py on GoogleColaboratory [002]

Last updated at 2021-02-25Posted at 2021-02-24

previous article is ' VOSK test_simple.py on GoogleColaboratory [001] '

The Google Colab files for the articles on this page are at the bottom of this article.¹

Chinese Speech to text and translate by Google API
https://youtu.be/qtNFAPnqkhQ

Install VOSK on GoogleColaboratory

GoogleColab

!pip install vosk

!git clone https://github.com/alphacep/vosk-api

Download Language Model

case a: English Language Model ... see previous "VOSK test_simple.py on GoogleColaboratory [001]"
case b: Chinese Language Model

Download via https://alphacephei.com/vosk/models

a:English ASR testing

GoogleColab

%cd vosk-api/python/example
# English lang model
!wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
!unzip vosk-model-small-en-us-0.15.zip
%mv vosk-model-small-en-us-0.15 model

b:Chinese ASR testing

GoogleColab

%cd vosk-api/python/example
# Chinese lang model
!wget https://alphacephei.com/vosk/models/vosk-model-small-cn-0.3.zip 
!unzip vosk-model-small-cn-0.3.zip
%mv vosk-model-small-cn-0.3 model
!rm -rf vosk-model-small-cn-0.3.zip

Model structure

Once you trained the model arrange the files according to the following layout (see en-us-aspire for details):

am/final.mdl - acoustic model
conf/mfcc.conf - mfcc config file. Make sure you take mfcc_hires.conf version if you are using hires model (most external ones)
conf/model.conf - provide default decoding beams and silence phones. you have to create this file yourself, it is not present in kaldi model
ivector/final.dubm - take ivector files from ivector extractor (optional folder if the model is trained with ivectors)
ivector/final.ie
ivector/final.mat
ivector/splice.conf
ivector/global_cmvn.stats
ivector/online_cmvn.conf
graph/phones/word_boundary.int - from the graph
graph/HCLG.fst - this is the decoding graph, if you are not using lookahead
graph/HCLr.fst - use Gr.fst and HCLr.fst instead of one big HCLG.fst if you want to run rescoring
graph/Gr.fst
graph/phones.txt - from the graph
graph/words.txt - from the graph
rescore/G.carpa - carpa rescoring is optional but helpful in big models. Usually located inside data/lang_test_rescore
rescore/G.fst - also optional if you want to use rescoring

directory check

GoogleColab

!pwd

Test Audio sampling

case b: YouTube

GoogleColab

urltext ='https://youtu.be/cNSq5RdVf28' # Chinese YouTube Clip with no captions

GoogleColab

from urllib.parse import urlparse, parse_qs

args = [urltext]
video_id = ''


def extract_video_id(url):
    query = urlparse(url)
    if query.hostname == 'youtu.be': return query.path[1:]
    if query.hostname in {'www.youtube.com', 'youtube.com'}:
        if query.path == '/watch': return parse_qs(query.query)['v'][0]
        if query.path[:7] == '/embed/': return query.path.split('/')[2]
        if query.path[:3] == '/v/': return query.path.split('/')[2]
    # fail?
    return None

for url in args:
    video_id = (extract_video_id(url))
    print('youtube video_id:',video_id)
    
from IPython.display import YouTubeVideo

YouTubeVideo(video_id)

GoogleColab

!rm -rf e*.wav
!pip install -q youtube-dl
!youtube-dl --extract-audio --audio-format wav --output "extract.%(ext)s" {urltext}

GoogleColab

!apt install ffmpeg

!ffmpeg -i extract.wav -vn -acodec pcm_s16le -ac 1 -ar 16000 -f wav test1.wav

ASR test_simple.py ...

Speech to text

GoogleColab

# !/usr/bin/env python3

from vosk import Model, KaldiRecognizer, SetLogLevel
import sys
import os
import wave

path = '/content/vosk-api/python/example/'

SetLogLevel(0)

if not os.path.exists("model"):
    print ("Please download the model from https://alphacephei.com/vosk/models and unpack as 'model' in the current folder.")
    exit (1)

# wf = wave.open(path+'/test.wav',"rb")# a:English test sample
wf = wave.open(path+'/test1.wav',"rb")# b:Chinese lang test sample
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
    print ("Audio file must be WAV format mono PCM.")
    exit (1)

model = Model("model")
rec = KaldiRecognizer(model, wf.getframerate())

while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        print(rec.Result())
    else:
        print(rec.PartialResult())

print(rec.FinalResult())

GoogleColab

from IPython.display import Audio

# Audio(path+'/test.wav') # a:English
Audio(path+'/test1.wav') # b:Chinese

Original 'test_simple.py'

GoogleColab

%%bash
cat -n /content/vosk-api/python/example/test_simple.py

test_ffmpeg.py

GoogleColab

%%bash
cat -n /content/vosk-api/python/example/test_ffmpeg.py

check:
YouTube, Deepspeech, with Google Colaboratory [testing_0003]

YoavRamon/awesome-kaldi
https://github.com/YoavRamon/awesome-kaldi

test [googlecolab] removed ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up