Flutterの音声認識を色々試してみる

Posted at 2024-12-20

はじめに

社内でFlutterアプリに音声認識機能を追加できないかという話になり、今回実際に調査してみることにしました。
まずは基本形を作成し、そこから問題点を直していく流れとなっています。初めてのFlutter、Dartなので生暖かい目で読んでいただけたら幸いです。
またコードのデバックを完全完璧に行っていないので、利用の際は自己責任でお願いします。

開発環境

今回wslにFlutterを入れ、vscodeを使い開発しました。また実行はChromeで行います。

WSL バージョン: 2.2.4.0
Flutter 3.19.2 • channel stable • https://github.com/flutter/flutter.git
Framework • revision 7482962148 (8 months ago) • 2024-02-27 16:51:22 -0500
Engine • revision 04817c99c9
Tools • Dart 3.3.0 • DevTools 2.31.1

pubspec.yaml

name: flutterapptest
description: "A new Flutter project."
publish_to: 'none' 
version: 1.0.0+1

environment:
  sdk: '>=3.3.0 <4.0.0'

dependencies:
  flutter:
    sdk: flutter
  cupertino_icons: ^1.0.6
  http: ^1.2.2
  speech_to_text: ^6.1.1

dev_dependencies:
  flutter_test:
    sdk: flutter
  flutter_lints: ^3.0.0

flutter:
  uses-material-design: true

とりあえず実装してみる

まずサンプルで簡単なものを作成します。
今回音声認識をいろんな箇所で実装したいと考えていたため、画面部分とサービス部分に分けて実装しました。

speachtotext_page.dart

import 'dart:async';
import 'package:flutter/material.dart';
import 'speachtotext_service.dart';

class SpeachtoTextPage extends StatefulWidget {
  @override
  _SpeachtoTextPage createState() => _SpeachtoTextPage();
}

class _SpeachtoTextPage extends State<SpeachtoTextPage> {
  final SpeachtoTextService speechService = SpeachtoTextService();
  var text = "音声を文字に変換します";
  bool isListening = false;

  @override
  void initState() {
    super.initState();
    initializeSpeech();
  }

  Future<void> initializeSpeech() async {
    bool available = await speechService.initializeSpeech(
      (recognizedText) {
        setState(() {
          text = recognizedText;
        });
      },
      (listening) {
        setState(() {
          isListening = listening;
        });
      },
    );

    if (available) {
      listen();
    } else {
      setState(() {
        text = "音声認識が利用できません。";
      });
    }
  }

  void listen() async {
    await speechService.listen((recognizedText) {
      setState(() {
        text = recognizedText;
      });
    });
  }

  void stopListening() {
    speechService.stopListening();
    setState(() {
      isListening = false;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(
        title: const Text('音声認識'),
      ),
      body: Center(
        child: Padding(
          padding: const EdgeInsets.all(8.0),
          child: Column(
            mainAxisAlignment: MainAxisAlignment.center,
            children: [
              Text(
                '認識された音声: $text',
                style: const TextStyle(fontSize: 16, color: Colors.black),
              ),
              const SizedBox(height: 10),
            ],
          ),
        ),
      ),
    );
  }

  @override
  void dispose() {
    speechService.stopListening();
    super.dispose();
  }
}

speachtotext_service.dart

import 'dart:async';
import 'dart:developer';
import 'package:speech_to_text/speech_to_text.dart';

class SpeachtoTextService {
  final SpeechToText speechToText = SpeechToText();
  bool isListening = false;
  bool isSpeechInitialized = false;
  DateTime? activeStartTime;
  DateTime? inactiveStartTime;

  Future<bool> initializeSpeech(
      Function(String) onResult, Function(bool) onListeningChange) async {
    var available = await speechToText.initialize(
      onStatus: (status) {
        log('onStatus: $status', name: 'SpeechService');
        if (status == 'done' || status == 'notListening') {
          // マイクが停止したとき
          onListeningChange(false);
          logInactiveTime();
        } else if (status == 'listening') {
          // マイクがアクティブになったとき
          onListeningChange(true);
          logActiveTime();
        }
      },
      onError: (error) {
        log('Error: $error', name: 'SpeechService');
      },
    );

    if (available) {
      isSpeechInitialized = true;
      return true;
    }
    return false;
  }

  Future<void> listen(Function(String) onResult) async {
    if (isSpeechInitialized) {
      try {
        await speechToText.listen(
          onResult: (result) {
            onResult(result.recognizedWords);
          },
          localeId: 'ja_JP',
        );
        isListening = true;
        activeStartTime = DateTime.now();
      } catch (e) {
        log('Listening failed: $e', name: 'SpeechService');
      }
    }
  }

  void stopListening() {
    if (speechToText.isListening) {
      speechToText.stop();
      isListening = false;
      inactiveStartTime = DateTime.now();
    }
  }

  void logActiveTime() {
    if (inactiveStartTime != null) {
      final inactiveDuration = DateTime.now().difference(inactiveStartTime!);
      log('マイクが非アクティブだった時間: ${inactiveDuration.inMilliseconds}ミリ秒',
          name: 'SpeechService');
    }
    activeStartTime = DateTime.now();
  }

  void logInactiveTime() {
    if (activeStartTime != null) {
      final activeDuration = DateTime.now().difference(activeStartTime!);
      log('マイクがアクティブだった時間: ${activeDuration.inMilliseconds}ミリ秒',
          name: 'SpeechService');
    }
    inactiveStartTime = DateTime.now();
  }
}

動かしてみる

とりあえずサンプルができたので動かしてみます。

喋っている間とずっと音声認識してくれる
- 認識のリセットタイミングや、言い間違えた時の修正手段が無いので...不便
少し時間たったら音声認識が停止する
- 再起動しないと再度音声認識が使えない...不便

音声認識部分は問題なく動きましたが、このままでは使い勝手が悪いので修正していこうと思います。

修正してみる

まずはlisten methodのドキュメントを読んでみます。

listenFor sets the maximum duration that it will listen for, after that it automatically stops the listen for you. The system may impose a shorter maximum listen due to resource limitations or other reasons. The plugin ensures that listening is no longer than this but it may be shorter.

pauseFor sets the maximum duration of a pause in speech with no words detected, after that it automatically stops the listen for you. On some systems, notably Android, there is a system imposed pause of from one to three seconds that cannot be overridden. The plugin ensures that the pause is no longer than the pauseFor value but it may be shorter.

簡単にまとめるとlistenForでは何秒間音声認識をするかを設定でき、pauseForは音声入力がされなくなってから何秒で停止するかを設定できます。
二つとも指定時間経過すると音声認識が停止します。これを使うことで先ほど出てきた不便ポイントを解消できそうです。

音声認識は連続30秒まで
3秒無音で音声認識停止
マイクのアクティブフラグの追加
マイクが非アクティブになった時、アクティブになる処理を追加

以上の四点を反映させたコードがこちらになります。

speachtotext_service.dart

import 'dart:async';
import 'dart:developer';
import 'package:speech_to_text/speech_to_text.dart';

class SpeachtoTextService {
  final SpeechToText speechToText = SpeechToText();
  bool isListening = false;
  bool isSpeechInitialized = false;
  DateTime? activeStartTime;
  DateTime? inactiveStartTime;

  Future<bool> initializeSpeech(
      Function(String) onResult, Function(bool) onListeningChange) async {
    var available = await speechToText.initialize(
      onStatus: (status) {
        log('onStatus: $status', name: 'SpeechService');
        if (status == 'done' || status == 'notListening') {
          // マイクが停止したとき
          onListeningChange(false);
          logInactiveTime();
          restartListening(onResult, onListeningChange); // 自動的にリスニング再開
        } else if (status == 'listening') {
          // マイクがアクティブになったとき
          onListeningChange(true);
          logActiveTime();
        }
      },
      onError: (error) {
        log('Error: $error', name: 'SpeechService');
      },
    );

    if (available) {
      isSpeechInitialized = true;
      return true;
    }
    return false;
  }

  Future<void> listen(Function(String) onResult) async {
    if (isSpeechInitialized) {
      try {
        await speechToText.listen(
          onResult: (result) {
            onResult(result.recognizedWords);
          },
          localeId: 'ja_JP',
          listenFor: const Duration(seconds: 30), // 音声認識は連続30秒まで
          pauseFor: const Duration(seconds: 3), // 3秒無音で音声認識停止
        );
        isListening = true;
        activeStartTime = DateTime.now();
      } catch (e) {
        log('Listening failed: $e', name: 'SpeechService');
      }
    }
  }

  void stopListening() {
    if (speechToText.isListening) {
      speechToText.stop();
      isListening = false;
      inactiveStartTime = DateTime.now();
    }
  }

  // 一定時間後にリスニングを再開するメソッド
  void restartListening(
      Function(String) onResult, Function(bool) onListeningChange) {
    Future.delayed(const Duration(milliseconds: 1), () {
      listen(onResult); // 1ミリ秒後に再度リスニングを開始
    });
  }

  void logActiveTime() {
    if (inactiveStartTime != null) {
      final inactiveDuration = DateTime.now().difference(inactiveStartTime!);
      log('マイクが非アクティブだった時間: ${inactiveDuration.inMilliseconds}ミリ秒',
          name: 'SpeechService');
    }
    activeStartTime = DateTime.now();
  }

  void logInactiveTime() {
    if (activeStartTime != null) {
      final activeDuration = DateTime.now().difference(activeStartTime!);
      log('マイクがアクティブだった時間: ${activeDuration.inMilliseconds}ミリ秒',
          name: 'SpeechService');
    }
    inactiveStartTime = DateTime.now();
  }
}

再度動かしてみる

修正版を早速動かしていきます。

喋った後に3秒立つと音声認識が停止し、その後再開するのが確認できました。

おわりに

今回のFlutterでの音声認識機能実装を通して、「リセットタイミングの調整」や「音声認識の自動再開機能」の実装は、使い勝手を大きく左右する重要なポイントだと感じました。これを元に既存のアプリに音声認識を追加していけたらと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up