Azure Speech Servicesを使って議事録Appを作成する①

Last updated at 2019-05-21Posted at 2019-05-21

この記事では、Azure Speech Servicesを使って議事録Appを作成する方法を試してみます！

議事録app作成まで以下二つを試してみました。

Speech SDKを使って.NET Frameworkベースで音声を認識するプログラムを作成する(今回の記事)
Javascript ベースで作成し、Web AppとしてDeployしてみる(次回の記事)

Speech SDKを使って.NET Frameworkベースで音声を認識するプログラムを作成する

以下の手順で音声認識をするプログラムを作成しました。
①Visual Studio 2019をインストール
②.NET FrameworkのConsole App プロジェクトを作成
③Speech SDK NuGet パッケージのインストール
④Debug設定
④Azure Portalで Speech ServicesのDeploy
⑤コード作成
⑥Debug/音声認識

①Visual Studio 2019をインストール

以下よりVisual Studioのインストールを実施します（せっかくなので2019を使ってみましょう！）
https://visualstudio.microsoft.com/downloads/?utm_medium=microsoft&utm_source=docs.microsoft.com&utm_campaign=button+cta&utm_content=download+vs2019

.NET FrameworkのConsole App プロジェクトを作成

新規プロジェクトの作成より、下記のようにコンソールアプリ(.NET Framework)を選択します。

Speech SDK NuGet パッケージのインストール

右上のソリューションパッケージの管理からNugetをソリューションを検索し、[Microsoft.CognitiveServices.Speech]をプロジェクトにインストールします。

Debug設定

正しくBuildするためにご自身の環境にあわせてビルドの構成をします。
（ビルド/構成マネジャー　より設定）
64bit環境→x64
32bit環境→x84

Azure Portalで Speech ServicesのDeploy

AzureポータルからSpeechを検索しデプロイします。KeyとRegionはメモっておきます。

後程コードの中に書き込むKeyとRegionですが、
Regionは以下のガイドラインに従って正しい値を入れます。
https://docs.microsoft.com/ja-jp/azure/cognitive-services/speech-service/regions
私は中央アメリカでやりましたので、「centralus」と入力します。

コード作成

MS Docsのサンプルコードベタ張りですが、下記のコードを「Priogram.cs」にコピペしてください。
(この際、KeyとRegionをご自身のものに設定します。)

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

namespace helloworld
{
    class Program
    {
        public static async Task RecognizeSpeechAsync()
        {
            // Creates an instance of a speech config with specified subscription key and service region.
            // Replace with your own subscription key and service region (e.g., "westus").
            var config = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourServiceRegion");

            // Creates a speech recognizer.
            using (var recognizer = new SpeechRecognizer(config))
            {
                Console.WriteLine("Say something...");

                // Starts speech recognition, and returns after a single utterance is recognized. The end of a
                // single utterance is determined by listening for silence at the end or until a maximum of 15
                // seconds of audio is processed.  The task returns the recognition text as result. 
                // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
                // shot recognition like command or query. 
                // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
                var result = await recognizer.RecognizeOnceAsync();

                // Checks result.
                if (result.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine($"We recognized: {result.Text}");
                }
                else if (result.Reason == ResultReason.NoMatch)
                {
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                }
                else if (result.Reason == ResultReason.Canceled)
                {
                    var cancellation = CancellationDetails.FromResult(result);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                    if (cancellation.Reason == CancellationReason.Error)
                    {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
                    }
                }
            }
        }

        static void Main()
        {
            RecognizeSpeechAsync().Wait();
            Console.WriteLine("Please press a key to continue.");
            Console.ReadLine();
        }
    }
}

Debug/音声認識

ここまで出来たらはデバッグします。
「Speach Something」がでるので、何か英語の音声を発すると認識してくれます。
(日本語でもイケルっぽいけど精度があまりよくないです)

次回は、Javascriptベースでモバイルで動くWeb Appを書いてみます。

参照記事

Speech Service がサポートされているリージョン
https://docs.microsoft.com/ja-jp/azure/cognitive-services/speech-service/regions

クイックスタート:.NET Framework (Windows) 用 Speech SDK を使用して音声を認識する
https://docs.microsoft.com/ja-jp/azure/cognitive-services/speech-service/quickstart-csharp-dotnet-windows

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up