More than 3 years have passed since last update.

自動音声認識を試す（Google Speech, Amazon Transcribe）

Posted at 2021-11-07

音声ファイルを用意する

iPhoneの「ボイスメモ」で録った音声をAirDropでMacに送る
m4a から wav に変換する

afconvert -f WAVE -d LEI16 -c 1  XXXXX.m4a XXXXX.wav

Google Speech-to-Text を試す

マニュアルを参考にサンプルコードを手元で動かす
Cloud SDKはすでに入れている
テスト用にGoogle Speech-to-Text を有効化した認証情報設定

export GOOGLE_APPLICATION_CREDENTIALS = XXXXX.json

音声ファイルをGCSにあげておく

gsutil cp XXXXX.wav gs://bucketsXXX/XXXXX.wav

サンプルコードを手元に落とす

git clone https://github.com/googleapis/python-speech.git

仮想環境作成し必要なモジュール入れる

$ virtualenv env
$ source env/bin/activate
$ pip install -r requirements.txt

サンプルコードに手を加え

-        sample_rate_hertz=16000,
-        language_code="en-US",
-        model=model,
+        sample_rate_hertz=48000,
+        language_code="ja-JP",
+        #model=model,

GCS上の音声ファイルをテキスト化してみる

python transcribe_model_selection.py gs://bucketsXXX/XXXXX.wav > interview.txt

Google Speech API を試す

非同期で処理

$ gcloud ml speech recognize-long-running 'gs://bucketsXXX/XXXXX.wav' --language-code='ja-JP' --sample-rate=48000 --async

Check operation [operations/XXXXXXXXXXXXX] for status.
{
  "name": "XXXXXXXXXXXXX"
}

下記コマンドで処理状況を確認。

終わっていれば progressPercent が100になりresponseにテキストが返る

$ gcloud ml speech operations describe XXXXXXXXXXXXX


{
  "done": true,
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "lastUpdateTime": "2021-09-XXT09:15:37.404790Z",
    "progressPercent": 100,
    "startTime": "2021-09-XXT08:59:39.288851Z",
    "uri": "gs://bucketsXXX/XXXXX.wav"
  },
  "name": "XXXXXXXXXXXXX",
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      {
        "alternatives": [
          {
            "confidence": 0.8462801,
            "transcript": "XXXXXXX"
          }
        ]
      },
      {
        "alternatives": [
          {
            "confidence": 0.9102794,
            "transcript": "XXXXXX"
          }
        ]
      },
・・・・・
}

結果は Google Speech-to-Text と同じようだ

Amazon Transcribeを試す

チュートリアルに沿って画面からアップロードするだけなので簡単

処理完了後、結果はjsonでダウンロードできる

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up