GCP Speech to Text のサンプルを動かしてみる #C#

GCP：Speech to Textのサンプルアプリを動かすまでのメモです

参考にしたページ
- Speech to Text ドキュメント（GCP公式リファレンス）
- サンプルアプリ（GitHub）

手順はGitHubのREADMEに書いてある内容を参考にしてます。

1）Cloud SDKのインストール

Google Cloud SDK ドキュメント
→GCPを弄るための様々なライブラリやコマンドセットが入っているそうです。インストーラからインストールします。
もしかしたらサンプル動かしたりするだけだったら要らない？

2）サービスアカウントキーの取得と環境変数への設定

.NET 開発環境の設定の内容を行っていきます。

1.サービスアカウントキーの発行
サービスアカウントキーはアプリが使う認証情報です。
GCP上で作成し、JSONで発行・DLできます。
JSONは認証情報を持つため保管は要注意。

2.環境変数の設定
DLしたJSONのファイルパスを環境変数GOOGLE_APPLICATION_CREDENTIALSに設定しておきます。
ユーザー環境変数に登録すればOKです。

powershell


PS > $env:GOOGLE_APPLICATION_CREDENTIALS = "[YOUR DIR]\YOUR_PROJECT_ID-xxxxxxxxxxxx.js
on"
PS > [Environment]::SetEnvironmentVariab
le("GOOGLE_APPLICATION_CREDENTIALS", "[YOUR DIR]\YOUR_PROJECT_ID-xxxxxxxxxxxx.js
on", "User")

3）APIの有効化

GCP Speech to Text APIから有効化します。
→有効化にはクレジットカードの登録が必要です。

4）リポジトリのクローン

サンプルアプリをクローンします。

terminal

git clone https://github.com/GoogleCloudPlatform/dotnet-docs-samples

5）サンプルアプリを動かしてみる

本サンプルは入力元は音声ファイルとマイク入力に対応してます。
ファイルはREST、マイクはgRPCでのストリームで行っている（はず）です。

Quick Start Sample

リポジトリ内に予め用意されている音声ファイルをRESTで投げてくれます。
パスは以下です。

powershell

PS > cd "YOUR_DIR\dotnet-docs-samples\speech\api\QuickStart"

ここで、以下のように実行します。

powershell

dotnet-docs-samples\speech\api\QuickStart > dotnet restore

Welcome to .NET Core!
---------------------
Learn more about .NET Core: https://aka.ms/dotnet-docs
Use 'dotnet --help' to see available commands or visit: https://aka.ms/dotnet-cli-docs

Telemetry
---------
The .NET Core tools collect usage data in order to help us improve your experience. The data is anonymous and does not include command-line arguments. The data is collected by Microsoft and shared with the community. You can opt-out of telemetry by setting the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to '1' or 'true' using your favorite shell.

Read more about .NET Core CLI Tools telemetry: https://aka.ms/dotnet-cli-telemetry

ASP.NET Core
------------
Successfully installed the ASP.NET Core HTTPS Development Certificate.
To trust the certificate run 'dotnet dev-certs https --trust' (Windows and macOS only). For establishing trust on other platforms refer to the platform specific documentation.
For more information on configuring HTTPS see https://go.microsoft.com/fwlink/?linkid=848054.
  Restoring packages for C:\Users\Admin\Develop\GCP\Repositorys\dotnet-docs-samples\speech\api\QuickStart\QuickStart.csproj...
  Generating MSBuild file C:\Users\Admin\Develop\GCP\Repositorys\dotnet-docs-samples\speech\api\QuickStart\obj\QuickStart.csproj.nuget.g.props.
  Generating MSBuild file C:\Users\Admin\Develop\GCP\Repositorys\dotnet-docs-samples\speech\api\QuickStart\obj\QuickStart.csproj.nuget.g.targets.
  Restore completed in 1.38 sec for C:\Users\Admin\Develop\GCP\Repositorys\dotnet-docs-samples\speech\api\QuickStart\QuickStart.csproj.

dotnet-docs-samples\speech\api\QuickStart > dotnet run
how old is the Brooklyn Bridge

成功すると以下のような応答が返ってきました。

how old is the Brooklyn Bridge

Recognize

Recognizeでは数十秒の音声ストリーミングでSTTができます。
実行するには以下に移動します。

powershell

PS > cd "YOUR_DIR\dotnet-docs-samples\speech\api\Recognize"

以下のオプションを付与するとNAudioでマイクからストリーミングでSTTできます。

powershell

dotnet-docs-samples\speech\api\Recognize > dotnet run listen 10

listenはマイクからのストリーム入力を示すオプションで、数値は何秒STTするかの指定です。
その他のオプションは以下で確認できます。

powersehll

dotnet-docs-samples\speech\api\Recognize > dotnet run help
Recognize 1.0.0
Copyright (C) 2018 Recognize

  sync            Detects speech in an audio file.
  async           Creates a job to detect speech in an audio file, and waits for the job to complete.
  stream          Detects speech in an audio file by streaming it to the Speech API.
  listen          Detects speech in a microphone input stream.
  rec             Detects speech in an audio file. Supports other file formats.
  sync-creds      Detects speech in an audio file.
  with-context    Detects speech in an audio file. Add additional context on stdin.
  help            Display more information on a specific command.
  version         Display version information.

デフォルトでは英語の認識をするので注意。
日本語にしたいときは、Recognize.csの中で、

LanguageCode = "en"

を

LanguageCode = "ja-JP"

に書き換えます。（各オプション毎に書かれてるみたいです）