More than 5 years have passed since last update.

【swift4】オフライン下でのiOS音声認識【SFSpeechRecognizer】

Last updated at 2019-07-19Posted at 2018-03-31

音声認識をオフラインで使いたい案件がありまして、少し調べました。

iOSで音声認識する方法はいくつかありますが、今回はAppleが公式で提供しているSFSpeechRecognizerについて。

SFSpeechRecognizer とは

先にも記述しましたが、iOSで音声認識するためのフレームワークです。Appleが公式で提供しています。
大体、以下が特徴になります。

参考【iOS 10】Speechフレームワークで音声認識
iOS内蔵のAPI
Siriに使われているものと同じで、高精度
バッテリーやネットワーク通信の観点から、比較的負荷のかかる処理ではある
基本的にオンライン必須
使用制限
- 連続最長1分まで
- 端末につき1時間あたりの1000回まで

オフライン環境で使う

調査した結果、以下の通りです。

日本語はオフラインでは使用不可
英語はオフライン使用可能（中国語もイケるよう）
A9プロセッサ以降のiPhoneで使用可能なよう
- iPhone 6s, iPhone 6s Plus, iPhone SE, iPad (第5世代)以降の世代ですね
- もちろん iPhone7, 8などもOK
- 参考 Which ios devices support offline speech recognition
オフラインだと精度が落ちる感じあり

日本語が扱えないのが悲しいですね。

とりあえず、仕様として、
「オンラインであれば日本語、オフラインであれば英語で音声認識を行う」
機能を実装しました。

実装

まず、SFSpeechRecognizerを用いた基本的なViewを作成します。
デリゲートメソッドなど、実装の詳細などは以下を参考にしてください。

[iOS 10] SFSpeechRecognizerで音声認識を試してみた

class SpeechViewController: UIViewController {
    private var speechRecognizer:SFSpeechRecognizer?
    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    private var recognitionTask: SFSpeechRecognitionTask?
    private var audioEngine = AVAudioEngine()
    
    ...(略)...
}

次に、音声認識対象言語の切り替えアラートを表示するためのメソッドを用意。

class SpeechViewController: UIViewController {

    ...(略)...

    static func showSimpleAlert(title:String, message:String="", ok_handler:@escaping ()->Void, cancel_handler:(()->Void)?=nil, at:UIViewController){
        let alert: UIAlertController = UIAlertController(title: title, message: message, preferredStyle:  UIAlertControllerStyle.alert)
        let defaultAction: UIAlertAction = UIAlertAction(title: "OK", style: UIAlertActionStyle.default, handler:{
            // ボタンが押された時の処理を書く（クロージャ実装）
            (action: UIAlertAction!) -> Void in
            print("OK")
            ok_handler()
        })
        alert.addAction(defaultAction)
        
        if let cancel_handler = cancel_handler{
            let cancelAction: UIAlertAction = UIAlertAction(title: "キャンセル", style: UIAlertActionStyle.cancel, handler:{
                // ボタンが押された時の処理を書く（クロージャ実装）
                (action: UIAlertAction!) -> Void in
                print("Cancel")
                cancel_handler()
            })
            alert.addAction(cancelAction)
        }
        
        at.present(alert, animated: true, completion: nil)
    }

    ...(略)...
}

最後です。

viewWillAppearに以下のフローを記述します。

SFSpeechRecognizer（日本語）の生成
SFSpeechRecognizerが有効であるかを問い合わせ
無効であれば、英語版に切り替えるようにアラート表示
OKボタンタップで英語版に切り替え


class SpeechViewController: UIViewController {

    ...(略)...

    override func viewWillAppear(_ animated: Bool) {

        // 1. SFSpeechRecognizer（日本語）の生成
        self.speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "ja_JP"))!
        speechRecognizer?.delegate = self as SFSpeechRecognizerDelegate

        // 2. SFSpeechRecognizerが有効であるかを問い合わせ
        if speechRecognizer!.isAvailable{
            print("speechRecognizer is Available")
        } else {
            print("speechRecognizer is not Available")
            // 3. 無効であれば、英語版に切り替えるようにアラート表示
            self.showSimpleAlert(title: "音声認識が使用できません。", 
                message: "通信がオフラインの可能性。英語版での音声認識を試してみますか？", 
                ok_handler: {
                    // 3. OKボタンタップで英語版に切り替え
                    self.speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en_US"))!
                    self.speechRecognizer!.delegate = self as SFSpeechRecognizerDelegate
                }, cancel_handler: {
                    // キャンセルボタンタップ時の動作を記述。
                    self.navigationController?.popViewController(animated: true)
                }, at: self)
        }

        ...(略)...

}

実装コード

まとめると以下のようです。

class SpeechViewController: UIViewController {
    private var speechRecognizer:SFSpeechRecognizer?
    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    private var recognitionTask: SFSpeechRecognitionTask?
    private var audioEngine = AVAudioEngine()

    override func viewWillAppear(_ animated: Bool) {

        // 1. SFSpeechRecognizer（日本語）の生成
        self.speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "ja_JP"))!
        speechRecognizer?.delegate = self as SFSpeechRecognizerDelegate

        // 2. SFSpeechRecognizerが有効であるかを問い合わせ
        if speechRecognizer!.isAvailable{
            print("speechRecognizer is Available")
        } else {
            print("speechRecognizer is not Available")
            // 3. 無効であれば、英語版に切り替えるようにアラート表示
            self.showSimpleAlert(title: "音声認識が使用できません。", 
                message: "通信がオフラインの可能性。英語版での音声認識を試してみますか？", 
                ok_handler: {
                    // 3. OKボタンタップで英語版に切り替え
                    self.speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en_US"))!
                    self.speechRecognizer!.delegate = self as SFSpeechRecognizerDelegate
                }, cancel_handler: {
                    // キャンセルボタンタップ時の動作を記述。
                    self.navigationController?.popViewController(animated: true)
                }, at: self)
    }
    
    static func showSimpleAlert(title:String, message:String="", ok_handler:@escaping ()->Void, cancel_handler:(()->Void)?=nil, at:UIViewController){
        let alert: UIAlertController = UIAlertController(title: title, message: message, preferredStyle:  UIAlertControllerStyle.alert)
        let defaultAction: UIAlertAction = UIAlertAction(title: "OK", style: UIAlertActionStyle.default, handler:{
            // ボタンが押された時の処理を書く（クロージャ実装）
            (action: UIAlertAction!) -> Void in
            print("OK")
            ok_handler()
        })
        alert.addAction(defaultAction)
        
        if let cancel_handler = cancel_handler{
            let cancelAction: UIAlertAction = UIAlertAction(title: "キャンセル", style: UIAlertActionStyle.cancel, handler:{
                // ボタンが押された時の処理を書く（クロージャ実装）
                (action: UIAlertAction!) -> Void in
                print("Cancel")
                cancel_handler()
            })
            alert.addAction(cancelAction)
        }
        
        at.present(alert, animated: true, completion: nil)
    }
}

注意

「2. SFSpeechRecognizerが有効であるかを問い合わせ」ではspeechRecognizer!.isAvailableを用いて判定をしています。
しかし実際に動作を確認してみると、これは「その言語で音声認識が可能であるか否か」**ではなく「オンラインであるか否か」**を返却しているようです。

なので、オフライン下で英語版に切り替えたあとのspeechRecognizer!.isAvailableは必ずFalseを返します。したがって、実際に音声認識が可能であるか否かは判定できないみたいです。切り替えた後、それが本当に動作するかはユーザに実際に試してもらうしかないですね。

オフラインで日本語はいけるのか

オフラインで日本語を扱えるライブラリは現状なさそうです。。自前で音声認識のモデルを作って、CoreMLで組み込むしかないんでしょうか。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up