More than 3 years have passed since last update.

Google Speech-to-Textの直接GCSアップロード機能をPythonで動かしてみた

SpeechToText

Last updated at 2021-04-06Posted at 2021-04-06

背景

先日、Google SpeechToTextの書き起こし結果の直接GCSアップロード機能がリリースされた

これまで容量の大きな音声ファイルの書き起こしはlongRunning + polling して書き起こしを取得する必要があった
この機能追加により、pollingが必要なくなり直接GCSに結果がuploadされるようになった
python-libraryの2.2.0にこのリリース対応がされていたが、httpリクエストではうまく動かなかったのでREST APIで実装した

実装

実装は Uploading your transcription results to a Cloud Storage bucket を参考に実装
認証は Google Cloud APIのAccess TokenをPythonで取得する(gcloud auth application-default print-access-tokenのやつ) を参考

import google.auth
import google.auth.transport.requests
import requests
import json


if __name__ == '__main__':
    # Access Token取得
    credentials, your_project_id = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)

    url = 'https://speech.googleapis.com/v1p1beta1/speech:longrunningrecognize'
    payload = {
      'config': {
         'language_code': 'ja-JP',
         'enable_word_time_offsets': 'True'
      },
      'output_config': {
         'gcs_uri':'gs://YOUR_BUCKET/PATH_TO_OUTPUT/direct.json'
      },
      'audio': {
        'uri': 'gs://YOUR_BUCKET/PATH_TO_AUDIO_FILE/audio.wav'
      }
    }

    headers = {'content-type': 'application/json', 'Authorization': f'Bearer {credentials.token}'}
    r = requests.post(url, data=json.dumps(payload), headers=headers)

これで問題なく動いた。
以下に一応動かなかった実装とエラーコードを貼っておく。

失敗したpython-libraryの実装とエラー

実装

google-cloud-speech のバージョンは 2.2.1
python 3.9

main.py

from google.cloud import speech_v1p1beta1 as speech

client = speech.SpeechClient()

audio = speech.RecognitionAudio(uri='gs://YOUR_BUCKET/PATH_TO_AUDIO/audio.wav')
config = speech.RecognitionConfig(
        language_code="ja-JP",
        enable_word_time_offsets=True
        )
output = speech.TranscriptOutputConfig(
        gcs_uri="gs://YOUR_BUCKET/PATH_TO_OUTPUT/direct.json"
        )
operation = client.long_running_recognize(config=config, audio=audio, output_config=output)
print("Waiting for operation to complete...")
response = operation.result(timeout=1000)
for result in response.results:
    print(u"Transcript: {}".format(result.alternatives[0].transcript))
    print("Confidence: {}".format(result.alternatives[0].confidence))

エラーコード抜粋

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "Internal error encountered. Please try again."
	debug_error_string = "{"created":"@1616664073.940171000","description":"Error received from peer ipv6:[xxx:xxx:xxx:xxx]:443","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"Internal error encountered. Please try again.","grpc_status":13}"

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up