More than 5 years have passed since last update.

MacでpythonのSpeechRecognitionを使って音声認識

Python

Posted at 2019-09-07

目的

MacでpythonのSpeechRecognitionを使って音声認識した際の備忘録です。

準備

下記の公式サイトを参考にMac用のライブラリをインストールします

SpeechRecognition 3.8.1

$ pip install SpeechRecognition
$ brew install portaudio
$ pip install pyaudio
$ pip install google-api-python-client

コード

下記のサンプルコードを改造してwavデータからテキスト生成します

audio_transcribe.py

audio_transcribe.py

import speech_recognition as sr

AUDIO_FILE = "./aps-smp.wav"

# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file

result=r.recognize_google(audio, language='ja-JP')

try:
    print("Google Speech Recognition thinks you said " + result)
except sr.UnknownValueError:
    print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
    print("Could not request results from Google Speech Recognition service; {0}".format(e))

テスト

コーパス開発センター　音声・転記テキストサンプルから、
学会講演[音声]のサンプル音声(aps-smp.mp3)をダウンロードします。

ダウンロードしたデータはmp3形式のため、wav形式に変換します。

$ ffmpeg -i aps-smp.mp3 aps-smp.wav

変換したデータをaudio_transcribe.pyで音声認識します。

$ $ python audio_transcribe.py
Google Speech Recognition thinks you said パラ言語情報ということなんですが 簡単に最初に復習をしておきたいと思います まああのー こうやって話しておりますと それはもちろんあの言語的情報を伝えるという事が一つの重要な目的 なんでありますが同時に パラ言語情報 そして 非言語情報が伝わっていますが この散文方は 藤崎先生によるものでして パラ言語情報というのは 要は 意図的に作業できるわしゃがちゃんとコントロールして出してるんだけども 言語情報と違って 連続的に変化するから カテゴライズすることが やや難しい そういった状況であります

ほぼ正解しています。

備考

サンプルのspeech_recognitionはなかなか私の声を認識してくれません..

$ python -m speech_recognition
A moment of silence, please...
Set minimum energy threshold to 124.235981999
Say something!
Got it! Now to recognize it...
Oops! Didn't catch that

参考

pythonで音声認識を使う – SpeechRecognitionを試してみる
 Pythonで音声入力に入門しよう(SpeechRecognition)
SpeechRecognition 3.8.1
コーパス開発センター　音声・転記テキストサンプル
 audio_transcribe.py

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up