More than 3 years have passed since last update.

完全オフラインで日本語対応のSpeech Recognitionモデル「Vosk」を試す

Posted at 2022-02-27

概要

手順

実行環境を用意
音声ファイルを用意
実行

実行環境を用意

cd $work_dir
pip3 install vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip
unzip vosk-model-small-ja-0.22.zip
mv vosk-model-small-ja-0.22 model

音声ファイルを用意

適当なmp3ファイルを用意して

ffmpeg -i "target.mp3" -vn -ac 1 -ar 44100 -acodec pcm_s16le -f wav "output.wav"

実行

run.py

#!/usr/bin/env python3

from vosk import Model, KaldiRecognizer, SetLogLevel
import sys
import os
import wave

SetLogLevel(0)

if not os.path.exists("model"):
    print ("Please download the model from https://alphacephei.com/vosk/models and unpack as 'model' in the current folder.")
    exit (1)

wf = wave.open(sys.argv[1], "rb")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
    print ("Audio file must be WAV format mono PCM.")
    exit (1)

model = Model("model")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)

while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        print(rec.Result())
    else:
        print(rec.PartialResult())

print(rec.FinalResult())

python3 ./run.py output.wav

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up