More than 5 years have passed since last update.

Dialogflow の Java client library は Android で動作するのか？（音声入力ストリーム編）

Posted at 2019-12-20

はじめに

本記事は「Dialogflow の Java client library は Android で動作するのか？（DetectIntent編）」の続編です。

軽くおさらいしておくと、
Java client library を使って Dialogflow に detectIntent し、その結果を取得することができました。

次は、音声入力のストリームを Dialogflow に投げて、リアルタイム解釈を行えるか確かめてみます。
（結論から言うとできました。）

下の URL は Google さんの音声入力ストリームするためのガイドです。
https://cloud.google.com/dialogflow/docs/detect-intent-stream?hl=ja#detect-intent-stream-java

前回記述した内容は省きますのであしからず。

環境

言語：Kotlin
端末：Pixcel3a (Android 10)

Dialogflow の準備

前回の記事を参照ください。

Let's Programming!!!

ライブラリの import

app/build.gradle

    implementation 'com.google.cloud:google-cloud-dialogflow:0.105.0-alpha'
    
    // https://mvnrepository.com/artifact/io.grpc/grpc-okhttp
    // GCP サービスとコネクション貼るときに使用
    implementation 'io.grpc:grpc-okhttp:1.25.0'

Manifest に PERMISSION 追加

AndroidManifest.xml

    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.RECORD_AUDIO" />

credentials.json を配置

res/raw に credentials.json を配置

※※注意※※

Credential を端末に持たせるのはいいことではありません。
今回は検証でとりあえず、とやっているだけですので、対策必須です！

layout

前回からの変更点として、

マイク入力のトグルボタンを追加
このトグルボタンを ON にすると音声認識開始
音声入力後の Dialogflow 応答値は Hello World に出力

activity_main.xml


// ...
    <ToggleButton
        android:id="@+id/micBtn"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Mic"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/textView" />

    <TextView
        android:id="@+id/textView2"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="マイク入力"
        app:layout_constraintBottom_toTopOf="@+id/micBtn"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent" />
// ...

MainActivity にボタンイベント追加

※必要箇所のみ抜粋

MainActivity.kt


// ...

class MainActivity : AppCompatActivity(), MessageDialogFragment.Listener {

    val handler = Handler()
    private var mVoiceRecorder : VoiceRecorder? = null

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        val streamingDetectIntent = StreamingDetectIntent(this)

        findViewById<ToggleButton>(R.id.micBtn).setOnCheckedChangeListener { buttonView, isChecked ->
            if(isChecked) {
                // パーミッションが許可されていなければ開始しない。
                if (!isPermissionGranted()) {
                    return@setOnCheckedChangeListener
                }

                val mVoiceCallback = object : VoiceRecorder.Callback() {
                    override fun onVoiceStart() {
                        // 音声入力開始。ストリーミング開始する。
                        val sampleRate = if(mVoiceRecorder == null) 16000 else mVoiceRecorder!!.sampleRate
                        streamingDetectIntent.startStream(sampleRate)
                    }
                    override fun onVoice(data: ByteArray, size: Int) {
                        // ストリーミング
                        streamingDetectIntent.streaming(data, size)
                    }
                    override fun onVoiceEnd() {
                        // 音声入力終了（音声の切れ目）。Dialogflow から返却されたテキストを画面に表示する。
                        streamingDetectIntent.stopStream {
                            if(it.isEmpty()) return@stopStream
                            handler.post {
                                findViewById<TextView>(R.id.textView).text = it
                            }
                        }
                    }
                }

                // 音声入力開始
                mVoiceRecorder = VoiceRecorder(mVoiceCallback)
                mVoiceRecorder?.start()
            } else {
                // 音声入力終了 喋ってる最中に切ると落ちる？
                mVoiceRecorder?.stop()
                mVoiceRecorder = null
            }
        }
    }

// ...

}

ユーザにマイク入力の許可をもらうダイアログを追加

※必要箇所のみ抜粋

MainActivity.kt


// ...

class MainActivity : AppCompatActivity(), MessageDialogFragment.Listener {

// ...

    override fun onStart() {
        super.onStart()
        requestPermission()
    }

// ...

    // region PERMISSION

    private val FRAGMENT_MESSAGE_DIALOG = "message_dialog"

    private fun showPermissionMessageDialog() {
        MessageDialogFragment
            .newInstance(getString(R.string.dialog__audio_permission_required))
            .show(supportFragmentManager, FRAGMENT_MESSAGE_DIALOG)
    }

    override fun onMessageDialogDismissed() {
        ActivityCompat.requestPermissions(this, arrayOf(Manifest.permission.RECORD_AUDIO),
            REQUEST_RECORD_AUDIO_PERMISSION)
    }

    private val REQUEST_RECORD_AUDIO_PERMISSION = 5000

    private fun requestPermission() {
        if (ActivityCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO) == PackageManager.PERMISSION_GRANTED) {
            return  // 既に付与されている。
        }

        if (ActivityCompat.shouldShowRequestPermissionRationale(this, Manifest.permission.RECORD_AUDIO)) {
            // permissionについての説明が必要な場合、ユーザに説明を表示する。
            showPermissionMessageDialog()
            return
        }

        // permissionをリクエストし、許可または拒否されるのを非同期に待つ。
        ActivityCompat.requestPermissions(this, arrayOf(Manifest.permission.RECORD_AUDIO), REQUEST_RECORD_AUDIO_PERMISSION)
    }

    private fun isPermissionGranted(): Boolean {
        return ActivityCompat.checkSelfPermission(this,
            Manifest.permission.RECORD_AUDIO
        ) == PackageManager.PERMISSION_GRANTED
    }

    // endregion
}

ダイアログクラスを追加

flagment/MessageDialogFragment.kt

package jp.wiseplants.googleassistanttest.activity.fragment

import android.app.Dialog
import android.os.Bundle
import androidx.appcompat.app.AlertDialog
import androidx.appcompat.app.AppCompatDialogFragment

/**
 * A simple dialog with a message.
 *
 *
 * The calling [android.app.Activity] needs to implement [ ].
 */
class MessageDialogFragment : AppCompatDialogFragment() {

    interface Listener {
        /**
         * Called when the dialog is dismissed.
         */
        fun onMessageDialogDismissed()
    }

    override fun onCreateDialog(savedInstanceState: Bundle?): Dialog {
        return AlertDialog.Builder(context!!)
                .setMessage(arguments!!.getString(ARG_MESSAGE))
                .setPositiveButton(android.R.string.ok) { dialog, which -> (activity as Listener).onMessageDialogDismissed() }
                .setOnDismissListener { (activity as Listener).onMessageDialogDismissed() }
                .create()
    }

    companion object {

        private val ARG_MESSAGE = "message"

        /**
         * Creates a new instance of [MessageDialogFragment].
         *
         * @param message The message to be shown on the dialog.
         * @return A newly created dialog fragment.
         */
        fun newInstance(message: String): MessageDialogFragment {
            val fragment = MessageDialogFragment()
            val args = Bundle()
            args.putString(ARG_MESSAGE, message)
            fragment.arguments = args
            return fragment
        }
    }

}

Android のマイク処理

こちらの VoiceRecorder.java を頂きました。

ただ、私の環境ではスッと動かず、

// こいつを
import android.support.annotation.NonNull;
// こう書き替えました。
import androidx.annotation.NonNull;

詳しくはこちらをご参照ください。

で、dialogflow/VoiceRecorder.javaに配置しました。

前回作成した DetectIntent クラス

※ちょっとだけコード変わってます。

dialogflow/DetectIntent.kt

package jp.hashioka.ryo.dialogflowsample.dialogflow

import android.content.Context
import android.util.Log
import com.google.api.gax.core.FixedCredentialsProvider
import com.google.auth.oauth2.GoogleCredentials
import com.google.cloud.dialogflow.v2.*
import jp.hashioka.ryo.dialogflowsample.R

/**
 * Dialogflow の detectIntent に関するクラス
 */
open class DetectIntent (
    context: Context
) {

    companion object {
        private const val TAG = "DetectIntent"
        const val PROJECT_ID = "voice-recognition-trial-261200"
        const val LANGUAGE_CODE = "ja" // TODO: Dialogflow の言語コードはグローバル対応するときに設定ファイルで管理
        val SCOPE = listOf("https://www.googleapis.com/auth/cloud-platform")

        /**
         * セッションを取得する。
         * TODO : Dialogflow のセッションはクライアント毎にユニークとなるよう処理を記述する。
         */
        fun getSession() : String {
            return "hogehoge"
        }
    }

    protected val sessionsClient : SessionsClient
    private val contextClient : ContextsClient

    init {
        // 認証情報セット
        val credentials = GoogleCredentials
            .fromStream(context.resources.openRawResource(R.raw.credentials))
            .createScoped(SCOPE)
        sessionsClient = createSessions(credentials)
        contextClient = createContexts(credentials)
    }

    /**
     * SessionClient を作成する。
     */
    private fun createSessions(credentials: GoogleCredentials): SessionsClient {
        val sessionsSetting =
            SessionsSettings.newBuilder()
                .setCredentialsProvider(FixedCredentialsProvider.create(credentials))
                .build()
        return SessionsClient.create(sessionsSetting)
    }

    /**
     * ContextsClient を作成する。
     */
    private fun createContexts(credentials: GoogleCredentials) : ContextsClient {
        val contextsSettings =
            ContextsSettings.newBuilder()
                .setCredentialsProvider(FixedCredentialsProvider.create(credentials))
                .build()
        return ContextsClient.create(contextsSettings)
    }

    /**
     * detectIntent を実行し、その結果を返却
     * 指定されたテキストを送信するだけ。
     */
    fun send(text: String) : String {
        val request = DetectIntentRequest.newBuilder()
            .setQueryInput(
                QueryInput.newBuilder()
                    .setText(
                        TextInput.newBuilder()
                            .setText(text)
                            .setLanguageCode(LANGUAGE_CODE)
                    )
                    .build())
            .setSession(SessionName.format(PROJECT_ID, getSession()))
            .build()

        val res = sessionsClient.detectIntent(request)
        Log.d(TAG, "response result : ${res.queryResult}")
        return res.queryResult.fulfillmentText
    }

    /**
     * detectIntent を実行し、その結果を返却
     * context 指定可能
     */
    fun send(text: String, contexts: List<String>) : String {
        val queryParametersBuilder = QueryParameters.newBuilder()
        contexts.forEach {
            queryParametersBuilder
                .addContexts(
                    com.google.cloud.dialogflow.v2.Context.newBuilder()
                        .setName(ContextName.format(PROJECT_ID, getSession(), it))
                        .setLifespanCount(5) // TODO: context の Lifespan を動的にする。
                        .build()
                )
        }

        // Dialogflow に投げるテキスト、コンテキストなどセット
        val request = DetectIntentRequest.newBuilder()
            .setQueryParams(queryParametersBuilder.build())
            .setQueryInput(
                QueryInput.newBuilder()
                    .setText(
                        TextInput.newBuilder()
                            .setText(text)
                            .setLanguageCode(LANGUAGE_CODE)
                    )
                    .build())
            .setSession(SessionName.format(PROJECT_ID, getSession()))
            .build()

        val res = sessionsClient.detectIntent(request)
        Log.d(TAG, "response result : ${res.queryResult}")
        return res.queryResult.fulfillmentText
    }

    /**
     * context をリセットする。
     */
    fun resetContexts() {
        contextClient.deleteAllContexts(SessionName.format(PROJECT_ID, getSession()))
    }
}

ストリーミング！

上記 DetectIntent クラスを継承しています。
セッション・プロジェクトID などはそちらから取得し、ストリーミングに必要な処理だけをまとめてみました。

dialogflow/StreamingDetectIntent.kt

package jp.hashioka.ryo.dialogflowsample.dialogflow

import android.content.Context
import android.util.Log
import com.google.api.gax.rpc.BidiStream
import com.google.cloud.dialogflow.v2.*
import com.google.protobuf.ByteString

/**
 * Dialogflow の detectIntent を音声入力ストリームで行う
 */
class StreamingDetectIntent (
    context: Context
) : DetectIntent(context) {

    companion object {
        private const val TAG = "StreamingDetectIntent"
    }

    private var session = SessionName.of(PROJECT_ID, getSession()).toString()

    // region streaming

    // Build the query with the InputAudioConfig
    private var queryInput : QueryInput? = null
    // Create the Bidirectional stream
    private var bidiStream : BidiStream<StreamingDetectIntentRequest, StreamingDetectIntentResponse>? = null

    fun startStream(sampleRate: Int) {

        val inputAudioConfig = InputAudioConfig.newBuilder()
            .setAudioEncoding(AudioEncoding.AUDIO_ENCODING_LINEAR_16)
            .setLanguageCode(LANGUAGE_CODE)
            .setSampleRateHertz(sampleRate)
            .build()
        // Build the query with the InputAudioConfig
        queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build()

        // Create the Bidirectional stream
        bidiStream = sessionsClient.streamingDetectIntentCallable().call()

        // The first request must **only** contain the audio configuration:
        bidiStream?.send(
            StreamingDetectIntentRequest.newBuilder()
                .setSession(session)
                .setQueryInput(queryInput)
                .build()
        )
    }

    fun streaming(data: ByteArray, size: Int) {
        bidiStream?.send(
            StreamingDetectIntentRequest.newBuilder()
                .setInputAudio(ByteString.copyFrom(data, 0, size))
                .build()
        )
    }

    fun stopStream(callback:(text:String)->Unit) {
        if(bidiStream == null) return

        // Tell the service you are done sending data
        bidiStream?.closeSend()

        for (response in bidiStream!!) {
            val queryResult = response.queryResult
            Log.d(TAG, "====================")
            Log.d(TAG, "Intent Display Name: ${queryResult.intent.displayName}")
            Log.d(TAG, "Query Text: '${queryResult.queryText}'")
            Log.d(TAG, "Detected Intent: ${queryResult.intent.displayName} (confidence: ${queryResult.intentDetectionConfidence})")
            Log.d(TAG, "Fulfillment Text: '${queryResult.fulfillmentText}'")
            callback(queryResult.fulfillmentText)
        }

        bidiStream = null
        queryInput = null
    }

    // endregion
}

おわりに

なんか雑な記事で申し訳ないです。
お詫びに GitHub に公開しました。
https://github.com/ryohashioka/DialogflowSampleForAndroid

ご参考にどうぞ。

気が向いたら丁寧に書こうかな…。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up