iOSで発話認識する最小構成

Last updated at 2022-07-15Posted at 2021-11-11

#iPhoneで発話認識するの手順

ユーザーの発話内容を認識する最小構成です。
本記事のコードをベタ貼りしていけば音声取得してテキストにするまでができます。

【参考画像：音声認識を用いたサンプルアプリ(RealityKit-Sampler)】

#####１、AVFoundationとSpeechをインポート

AVFoundationは音声取得、Speechは認識タスクにつかいます。

import Speech 
import AVFoundation

// 　以下必要なプロパティ

var speechRecognizer:SFSpeechRecognizer?
var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
var recognitionTask: SFSpeechRecognitionTask?
var audioEngine:AVAudioEngine?

#####２、SFSpeechRecognizer と AudioEngine を初期化

// デバイスの国の言語で初期化。　対応言語でなければ、英語に。
speechRecognizer = SFSpeechRecognizer(locale: Locale.current) ?? SFSpeechRecognizer(locale: Locale(identifier: "en_US"))
audioEngine = AVAudioEngine()

#####３、発話認識の許可を得る

Infoに
・Privacy - Speech Recognition Usage Description
・Privacy - Microphone Usage Description
を追加する（追加しないとクラッシュ）

発話認識許可をリクエストする

// 許可の状態を確認する

SFSpeechRecognizer.requestAuthorization { authStatus in
    // 許可の状態に応じて UI を更新する
    OperationQueue.main.addOperation {
        switch authStatus {
        case .authorized:
            print("authorized")
            // 「発話してください」アラートを出すなどの処理
        case .denied:
            print("denied")
            // 拒否された時の処理
        case .restricted:
            print("denied")
            // 制限を知らせるアラートを出すなどの処理
        case .notDetermined:
            print("notDetermined")
            // 許可を求めるアラートを出すなどの処理
        default:
            break
        }
    }
}

#####４、Audio セッションを構成


let audioSession = AVAudioSession.sharedInstance()
do {
    try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
    try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
    let inputNode = audioEngine!.inputNode
} catch let error {
    print(error)
}

#####５、発話認識リクエストを作成

recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
recognitionRequest.shouldReportPartialResults = true // 発話ごとに中間結果を返すかどうか
        
// requiresOnDeviceRecognition を true に設定すると、音声データがネットワークで送られない
// ただし精度は下がる
recognitionRequest.requiresOnDeviceRecognition = false

#####６、リクエストから認識タスクを作成

ここで、完了ハンドラ内の認識結果の処理を書きます。

// 既存のタスクがあればキャンセルしておく
self.recognitionTask?.cancel()
self.recognitionTask = nil

self.recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) { result, error in

    // 取得した認識結の処理

    var isFinal = false
            
    if let result = result {
        isFinal = result.isFinal
        // 認識結果をプリント
        print("RecognizedText: \(result.bestTranscription.formattedString)")
    }
            
    if error != nil || isFinal {
        // 終了時、もしくはエラーが出た場合は、音声取得と認識をストップする
        self.audioEngine?.stop()
        inputNode.removeTap(onBus: 0)
                
        self.recognitionRequest = nil
        self.recognitionTask = nil
    }
}

#####７、マイクから音声を取得して認識リクエストにわたす構成


let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
  // 音声を取得したら
    self.recognitionRequest?.append(buffer) // 認識リクエストに取得した音声を加える
}

#####８、音声取得開始

audioEngine?.prepare()
try audioEngine?.start()

RecognizedText: Hello

ここまでが発話認識する最小手順です🐥

#認識結果の取得ペース

SFSpeechAudioBufferRecognitionRequest.shouldReportPartialResults: Bool

というプロパティがあって、これによって結果のあつかい方にちがいがでてくる。

true に設定すると、認識タスク中に何度も中間結果が返ってくる。
この中間結果は認識タスクの最初から中間結果生成までの全文である。

中間結果のプリント
Text Hello
Text Hello
Text Hello
Text Hello
Text Hello I am a
Text Hello I am
Text Hello I am doing
Text Hello I am doing
Text Hello I am doing
Text Hello I am doing the
Text Hello I am doing that
Text Hello I am doing that
Text Hello I am doing the recognition
Text Hello I am doing the recognition task
Text Hello I am doing the recognition task
Text Hello I am doing the recognition task
Text Hello I am doing the recognition task by
Text Hello I am doing the recognition task by
Text Hello I am doing the recognition task by
Text Hello I am doing the recognition task by speech
Text Hello I am doing the recognition task by speech
Text Hello I am doing the recognition task by speech
Text Hello I am doing the recognition task by speech
Text Hello I am doing the recognition task by speech speech
Text Hello I am doing the recognition task by speech frame walk
Text Hello I am doing the recognition task by speech framework
Text Hello I am doing the recognition task by speech framework

falseに設定すると、結果はひとまとまりで返ってくる。
この場合明示的に以下を呼ばないと結果が返ってこない。

SFSpeechRecognitionTask.finish()

Text Hello I am doing the recognition task by speech framework

使途に応じて使い分ける必要がある。

#冒頭の音声認識とARを用いたサンプルはこちら。
RealityKit-Sampler

🐣

フリーランスエンジニアです。
お仕事のご相談こちらまで
rockyshikoku@gmail.com

Core MLを使ったアプリを作っています。
機械学習関連の情報を発信しています。

Twitter
Medium

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up