Unity STT(Speech-to-Text) ライブラリのおすすめ

Last updated at 2024-08-02Posted at 2024-06-09

📃oshoham - UnityGoogleStreamingSpeechToText

Supported Platform : Only Windows OS
License : MIT

📃yasirkula - UnitySpeechToText

Supported Platform : AOS(all versions) & IOS 10+
License : MIT
Releases
KNOWN ISSUES : Speech session returned error code 12 on a single Android test device (regardless of target language) and couldn't be started
How to
参考: コードベースはXMLコメントを使用して文書化されているため、このセクションでは関数について簡略に述べます。

まず、SpeechToText.Initialize(string preferredLanguage = null)を介してプラグインを初期化する必要があります。指定された言語（"en-US"形式）を提供しない場合、デバイスのデフォルト言語が使用されます。SpeechToText.IsLanguageSupported(string language)を使用して、言語がサポートされているかどうかを確認できます。

初期化後、SpeechToText.IsServiceAvailable(bool preferOfflineRecognition = false)およびSpeechToText.IsBusy()をクエリして、音声認識セッションを開始できるかどうかを確認できます。サービスが利用できないかビジー状態の間は、ほとんどの操作が失敗します。

音声認識セッションを開始する前に、SpeechToText.CheckPermission()およびSpeechToText.RequestPermissionAsync(PermissionCallback callback)関数を介して、必要な権限が付与されていることを確認する必要があります。権限が拒否された場合、SpeechToText.OpenSettings()を呼び出して、ユーザーが手動で必要な権限を付与できるアプリの設定を自動的に開くことができます（Android：マイク、iOS：マイクと音声認識）。Androidでは、音声認識システムにはGoogleアプリがマイクの権限を持つ必要もあります。そうでない場合、その結果コールバックはエラーコード9を返します。そのシナリオでは、ユーザーに通知し、SpeechToText.OpenGoogleAppSettings()を呼び出して、Googleアプリの設定を自動的に開き、手動でマイクの権限を付与できるようにします。

音声認識セッションを開始するには、SpeechToText.Start(ISpeechToTextListener listener, bool useFreeFormLanguageModel = true, bool preferOfflineRecognition = false)を呼び出すことができます。通常、セッションはスピーチの短い休憩の後に自動的に終了しますが、SpeechToText.ForceStop()（これまでの音声入力を処理）またはSpeechToText.Cancel()（音声入力を処理せず、エラーコード0で即座に結果コールバックを呼び出す）を介して手動でセッションを停止することもできます。ISpeechToTextListenerインターフェースには、次の関数があります：
- OnReadyForSpeech()
- OnBeginningOfSpeech()
- OnVoiceLevelChanged(float normalizedVoiceLevel)
- OnPartialResultReceived(string spokenText)
- OnResultReceived(string spokenText, int? errorCode)

EXAMPLE CODE

using UnityEngine;
using UnityEngine.UI;

public class SpeechToTextDemo : MonoBehaviour, ISpeechToTextListener
{
    public Text SpeechText;
	public Button StartSpeechToTextButton, StopSpeechToTextButton;
	public Slider VoiceLevelSlider;
	public bool PreferOfflineRecognition;

    private float normalizedVoiceLevel;
    
    private void Awake()
    {
        SpeechToText.Initialize( "en-US" );

		StartSpeechToTextButton.onClick.AddListener( StartSpeechToText );
		StopSpeechToTextButton.onClick.AddListener( StopSpeechToText );
	}

	private void Update()
	{
		StartSpeechToTextButton.interactable = SpeechToText.IsServiceAvailable( PreferOfflineRecognition ) && !SpeechToText.IsBusy();
		StopSpeechToTextButton.interactable = SpeechToText.IsBusy();

		// You may also apply some noise to the voice level for a more fluid animation (e.g. via Mathf.PerlinNoise)
		VoiceLevelSlider.value = Mathf.Lerp( VoiceLevelSlider.value, normalizedVoiceLevel, 15f * Time.unscaledDeltaTime );
	}

	public void ChangeLanguage( string preferredLanguage )
	{
		if( !SpeechToText.Initialize( preferredLanguage ) )
			SpeechText.text = "Couldn't initialize with language: " + preferredLanguage;
	}

	public void StartSpeechToText()
	{
		SpeechToText.RequestPermissionAsync( ( permission ) =>
		{
			if( permission == SpeechToText.Permission.Granted )
			{
				if( SpeechToText.Start( this, preferOfflineRecognition: PreferOfflineRecognition ) )
					SpeechText.text = "";
				else
					SpeechText.text = "Couldn't start speech recognition session!";
			}
			else
				SpeechText.text = "Permission is denied!";
		} );
	}

	public void StopSpeechToText()
	{
		SpeechToText.ForceStop();
	}

	void ISpeechToTextListener.OnReadyForSpeech()
	{
		Debug.Log( "OnReadyForSpeech" );
	}

	void ISpeechToTextListener.OnBeginningOfSpeech()
	{
		Debug.Log( "OnBeginningOfSpeech" );
	}

	void ISpeechToTextListener.OnVoiceLevelChanged( float normalizedVoiceLevel )
	{
		// Note that On Android, voice detection starts with a beep sound and it can trigger this callback. You may want to ignore this callback for ~0.5s on Android.
		this.normalizedVoiceLevel = normalizedVoiceLevel;
	}

    void ISpeechToTextListener.OnPartialResultReceived( string spokenText )
	{
		Debug.Log( "OnPartialResultReceived: " + spokenText );
		SpeechText.text = spokenText;
	}

	void ISpeechToTextListener.OnResultReceived( string spokenText, int? errorCode )
	{
		Debug.Log( "OnResultReceived: " + spokenText + ( errorCode.HasValue ? ( " --- Error: " + errorCode ) : "" ) );
		SpeechText.text = spokenText;
		normalizedVoiceLevel = 0f;

		// Recommended approach:
		// - If errorCode is 0, session was aborted via SpeechToText.Cancel. Handle the case appropriately.
		// - If errorCode is 9, notify the user that they must grant Microphone permission to the Google app and call SpeechToText.OpenGoogleAppSettings.
		// - If the speech session took shorter than 1 seconds (should be an error) or a null/empty spokenText is returned, prompt the user to try again (note that if
		//   errorCode is 6, then the user hasn't spoken and the session has timed out as expected).
	}
}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up