【Swift5】 iOSで利用できる標準の音声合成(AVSpeechSynthesizer)の使い方のメモ

Posted at 2020-06-30

概要

標準で利用可能な音声合成(AVSpeechSynthesizer)を使う機会があったので、備忘録として使い方をまとめます。

AVSpeechSynthesizer公式Doc
https://developer.apple.com/documentation/avfoundation/avspeechsynthesizer

AVSpeechSynthesizerはテキストの発話から合成された音声を生成し、再生することのできるオブジェクトです。
再生中の音声のコントロールをするための機能も提供されています。

使い方

import AVFoundation

AVFoundationをimportすれば準備完了。

let synthesizer = AVSpeechSynthesizer()

AVSpeechSynthesizerのインスタンスを生成します。

読み上げ

let utterance = AVSpeechUtterance.init(string: "読み上げる文字列")
let voice = AVSpeechSynthesisVoice.init(language: "ja-JP")
utterance.voice = voice
synthesizer.speak(utterance)

読み上げる際はAVSpeechUtteranceを設定します。
AVSpeechUtterance.initに読み上げたい文字列を渡し、利用する言語も指定します。
上記の設定をしたら、後はAVSpeechSynthesizerのspeakメソッドにAVSpeechUtteranceを渡すだけで読み上げが始まります。

ポーズ

if synthesizer.isSpeaking {
    synthesizer.pauseSpeaking(at: .word)
}

読み上げ中は、pauseSpeakingメソッドで読み上げを停止できます。
引数にはAVSpeechBoundaryの定数(immidiate, word)が指定できます。
immidiateは発話を直ちに一時停止し、wordを指定した場合は現在発話されている単語の後に音声を停止します。
ちなみに、現在読み上げ中かどうかは、AVSpeechSynthesizerのisSpeakingメソッドで確認可能です。

※日本語だとword指定の際の停止の切れ目が不自然な印象を少し受けました。

読み上げの停止

synthesizer.stopSpeaking(at: .immediate)

stopSpeakingメソッドで読み上げの停止ができます。
pauseと同様、AVSpeechBoundaryの定数(immidiate, word)を指定することができます。

読み上げのスピードを指定

utterance.rate = 0.8

AVSpeechUtteranceのrateプロパティを指定することで読み上げのスピードが設定可能です。
最大スピードと最小スピードの定数も用意されています。

AVSpeechUtteranceMinimumSpeechRate
AVSpeechUtteranceMaximumSpeechRate

読み上げのピッチ

utterance.pitchMultiplier = 1.0

スピードと同様、AVSpeechUtteranceのプロパティを設定することでピッチ（声の高さ）も設定が可能です。

Delegate

extension ViewController: AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           didStart utterance: AVSpeechUtterance) {
        // 読み上げスタート
    }

    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           didFinish utterance: AVSpeechUtterance) {
        // 読み上げ終了
    }


    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           willSpeakRangeOfSpeechString characterRange: NSRange,
                           utterance: AVSpeechUtterance) {
        // 読み上げ中の発話
    }
}

AVSpeechSynthesizerDelegateを実装することによって、読み上げをハンドリングすることができます。
didStart, didFinishメソッドで読み上げのスタート・終了を検知することができる他、willSpeakRangeOfSpeechStringで部分的に読み上げられる文字列を取得することが可能です。
これを活用することによって、例えば読み上げている文字列を部分的に色を変えるといった実装を行うことができます。

var alreadyReadUtteranceString = ""

/ / ~~~~~ 省略 

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           willSpeakRangeOfSpeechString characterRange: NSRange,
                           utterance: AVSpeechUtterance) {
        let readingUtteranceString = (utterance.speechString as NSString).substring(with: characterRange)
        alreadyReadUtteranceString = alreadyReadUtteranceString + readingUtteranceString
        changeTextColorPartially()
    }

/ / ~~~~~ 省略

func changeTextColorPartially() {
    let attrText = NSMutableAttributedString(string: label.text!)
    attrText.addAttribute(
        .foregroundColor,
        value: UIColor.red,
        range: NSMakeRange(0, alreadyReadUtteranceString.count)
    )
    label.attributedText = attrText
}

以上、AVSpeechSynthesizerの使い方のメモでした。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up