More than 1 year has passed since last update.

AVSpeechSynthesizerで読み上げた音声をファイルに書き出す

Last updated at 2023-04-30Posted at 2023-04-30

はじめに

AVSpeechSynthesizerを使うとiOS/macOS上で簡単に音声読み上げ機能を作れます
その読み上げ音声をwavに保存する備忘録です

最終的にやりたいこと

指定したパスに読み上げた音声をwavファイルとして保存します

let file = URL(filePath: "/path/to/output.wav")
try await SpeechWriter().write(text: "こんにちは、世界。", to: file)

実装

AVSpeechSynthesizerのextension

AVSpeechSynthesizer.write(_:toBufferCallback:)を利用すると、生成されたAVAudioBufferをコールバックで順次受け取ることができます
このままでは不便なので、AsyncStreamに変換します
生成が完了するとAVAudioBufferが空になるので、データサイズが0の場合、ストリームを終了します

extension AVSpeechSynthesizer {
    func write(_ utterance: AVSpeechUtterance) -> AsyncStream<AVAudioBuffer> {
        AsyncStream(AVAudioBuffer.self) { continuation in
            write(utterance) { (buffer: AVAudioBuffer) in
                if buffer.audioBufferList.pointee.mBuffers.mDataByteSize > 0 {
                    continuation.yield(buffer)
                } else {
                    continuation.finish()
                }
            }
        }
    }
}

AVAudioPCMBufferの配列の取得

上のextensionを使って、日本語で指定した文字列で読み上げた結果のAVAudioBufferのAsyncStreamを取得します
AVAudioBufferのままでは不便なのではファイルに書き出す際に不便なので、AVAudioPCMBufferにキャストします

let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(language: "ja-JP")
let synthesizer = AVSpeechSynthesizer()
let buffers = synthesizer.write(utterance)
    .compactMap({ $0 as? AVAudioPCMBuffer })

AVAudioFileでファイルに書き出す

取得したAVAudioPCMBufferの先頭のAVAudioFormatを使って、AVAudioFileを作成します
残りはまとめて書き込みます

let first: AVAudioPCMBuffer = await buffers.first(where: { _ in true })!
let output = try AVAudioFile(forWriting: outputURL,
                             settings: first.format.settings,
                             commonFormat: first.format.commonFormat,
                             interleaved: first.format.isInterleaved)
try output.write(from: first)

for await buffer in buffers {
    try output.write(from: buffer)
}

全文

import Speech

let file = URL(filePath: "/path/to/output.wav")
try await SpeechWriter().write(text: "こんにちは、世界。", to: file)

class SpeechWriter {
    func write(text: String, to outputURL: URL) async throws {
        let utterance = AVSpeechUtterance(string: text)
        utterance.voice = AVSpeechSynthesisVoice(language: "ja-JP")
        let synthesizer = AVSpeechSynthesizer()
        let buffers = synthesizer.write(utterance)
            .compactMap({ $0 as? AVAudioPCMBuffer })

        let first: AVAudioPCMBuffer = await buffers.first(where: { _ in true })!
        let output = try AVAudioFile(forWriting: outputURL,
                                     settings: first.format.settings,
                                     commonFormat: first.format.commonFormat,
                                     interleaved: first.format.isInterleaved)
        try output.write(from: first)

        for await buffer in buffers {
            try output.write(from: buffer)
        }
    }
}

extension AVSpeechSynthesizer {
    func write(_ utterance: AVSpeechUtterance) -> AsyncStream<AVAudioBuffer> {
        AsyncStream(AVAudioBuffer.self) { continuation in
            write(utterance) { (buffer: AVAudioBuffer) in
                if buffer.audioBufferList.pointee.mBuffers.mDataByteSize > 0 {
                    continuation.yield(buffer)
                } else {
                    continuation.finish()
                }
            }
        }
    }
}

まとめ

async awaitを使うことでシンプルに実現できました
今回のように、ストリームのイベントの先頭を使って、何かしらの処理をしたい場合は、AsyncStreamはとても相性良いなと思いました

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up