LoginSignup
7
6

More than 1 year has passed since last update.

完全オフラインで日本語対応のSpeech Recognitionモデル「Vosk」を試す

Posted at

概要

完全オフラインで日本語対応のSpeech Recognitionモデル「Vosk」を試す

手順

  1. 実行環境を用意
  2. 音声ファイルを用意
  3. 実行

実行環境を用意

cd $work_dir
pip3 install vosk
wget https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip
unzip vosk-model-small-ja-0.22.zip
mv vosk-model-small-ja-0.22 model

音声ファイルを用意

適当なmp3ファイルを用意して

ffmpeg -i "target.mp3" -vn -ac 1 -ar 44100 -acodec pcm_s16le -f wav "output.wav"

実行

run.py
#!/usr/bin/env python3

from vosk import Model, KaldiRecognizer, SetLogLevel
import sys
import os
import wave

SetLogLevel(0)

if not os.path.exists("model"):
    print ("Please download the model from https://alphacephei.com/vosk/models and unpack as 'model' in the current folder.")
    exit (1)

wf = wave.open(sys.argv[1], "rb")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
    print ("Audio file must be WAV format mono PCM.")
    exit (1)

model = Model("model")
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)

while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        print(rec.Result())
    else:
        print(rec.PartialResult())

print(rec.FinalResult())
python3 ./run.py output.wav
7
6
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
6