previous article is ' VOSK test_simple.py on GoogleColaboratory [001] '
The Google Colab files for the articles on this page are at the bottom of this article.1
Chinese Speech to text and translate by Google API
https://youtu.be/qtNFAPnqkhQ
##Install VOSK on GoogleColaboratory
!pip install vosk
!git clone https://github.com/alphacep/vosk-api
##Download Language Model
- case a: English Language Model ... see previous "VOSK test_simple.py on GoogleColaboratory [001]"
- case b: Chinese Language Model
Download via https://alphacephei.com/vosk/models
###a:English ASR testing
%cd vosk-api/python/example
#English lang model
!wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
!unzip vosk-model-small-en-us-0.15.zip
%mv vosk-model-small-en-us-0.15 model
###b:Chinese ASR testing
%cd vosk-api/python/example
#Chinese lang model
!wget https://alphacephei.com/vosk/models/vosk-model-small-cn-0.3.zip
!unzip vosk-model-small-cn-0.3.zip
%mv vosk-model-small-cn-0.3 model
!rm -rf vosk-model-small-cn-0.3.zip
##Model structure
Once you trained the model arrange the files according to the following layout (see en-us-aspire for details):
- am/final.mdl - acoustic model
- conf/mfcc.conf - mfcc config file. Make sure you take mfcc_hires.conf version if you are using hires model (most external ones)
- conf/model.conf - provide default decoding beams and silence phones. you have to create this file yourself, it is not present in kaldi model
- ivector/final.dubm - take ivector files from ivector extractor (optional folder if the model is trained with ivectors)
- ivector/final.ie
- ivector/final.mat
- ivector/splice.conf
- ivector/global_cmvn.stats
- ivector/online_cmvn.conf
- graph/phones/word_boundary.int - from the graph
- graph/HCLG.fst - this is the decoding graph, if you are not using lookahead
- graph/HCLr.fst - use Gr.fst and HCLr.fst instead of one big HCLG.fst if you want to run rescoring
graph/Gr.fst - graph/phones.txt - from the graph
- graph/words.txt - from the graph
- rescore/G.carpa - carpa rescoring is optional but helpful in big models. Usually located inside data/lang_test_rescore
- rescore/G.fst - also optional if you want to use rescoring
##directory check
!pwd
##Test Audio sampling
###case b: YouTube
urltext ='https://youtu.be/cNSq5RdVf28' # Chinese YouTube Clip with no captions
from urllib.parse import urlparse, parse_qs
args = [urltext]
video_id = ''
def extract_video_id(url):
query = urlparse(url)
if query.hostname == 'youtu.be': return query.path[1:]
if query.hostname in {'www.youtube.com', 'youtube.com'}:
if query.path == '/watch': return parse_qs(query.query)['v'][0]
if query.path[:7] == '/embed/': return query.path.split('/')[2]
if query.path[:3] == '/v/': return query.path.split('/')[2]
# fail?
return None
for url in args:
video_id = (extract_video_id(url))
print('youtube video_id:',video_id)
from IPython.display import YouTubeVideo
YouTubeVideo(video_id)
!rm -rf e*.wav
!pip install -q youtube-dl
!youtube-dl --extract-audio --audio-format wav --output "extract.%(ext)s" {urltext}
!apt install ffmpeg
!ffmpeg -i extract.wav -vn -acodec pcm_s16le -ac 1 -ar 16000 -f wav test1.wav
##ASR test_simple.py ...
Speech to text
#!/usr/bin/env python3
from vosk import Model, KaldiRecognizer, SetLogLevel
import sys
import os
import wave
path = '/content/vosk-api/python/example/'
SetLogLevel(0)
if not os.path.exists("model"):
print ("Please download the model from https://alphacephei.com/vosk/models and unpack as 'model' in the current folder.")
exit (1)
#wf = wave.open(path+'/test.wav',"rb")# a:English test sample
wf = wave.open(path+'/test1.wav',"rb")# b:Chinese lang test sample
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
print ("Audio file must be WAV format mono PCM.")
exit (1)
model = Model("model")
rec = KaldiRecognizer(model, wf.getframerate())
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
print(rec.Result())
else:
print(rec.PartialResult())
print(rec.FinalResult())
from IPython.display import Audio
#Audio(path+'/test.wav') # a:English
Audio(path+'/test1.wav') # b:Chinese
##Original 'test_simple.py'
%%bash
cat -n /content/vosk-api/python/example/test_simple.py
##test_ffmpeg.py
%%bash
cat -n /content/vosk-api/python/example/test_ffmpeg.py
check:
YouTube, Deepspeech, with Google Colaboratory [testing_0003]
YoavRamon/awesome-kaldi
https://github.com/YoavRamon/awesome-kaldi
-
test [googlecolab] removed ↩