More than 1 year has passed since last update.

VideoToolBoxでH.264にswiftでエンコード、デコードする

Posted at 2021-12-28

はじめに

VideoToolBox を利用して、MTLTexture などの画像形式を H.264 にエンコード、デコードする方法です
H.264 に変換することで、データのサイズを小さくする事ができます
- 例えば、882x468 の動画の場合、毎フレームごとのサイズが 1.7MB の所を〜10KB に圧縮できました
最終的に、3つの Byte 配列と、1つの Int に変換させることができました
この形↓とサイズであれば十分にインターネットなどを経由してやりとりさせることができそうです

struct EncodedFrameEntity {
    let sps: [UInt8]
    let pps: [UInt8]
    let body: [Int8]
    let microSec: Int
}

ソースコードはこちら↓

こんな感じ↓で、リアルタイムでエンコード、デコードできることを確認しました
- この動画はわざと bitrate を落としています
次章から実際にエンコード、デコードする手順を紹介します

bitrateをめちゃくちゃ落とすとちゃんと一度エンコードされてる感じがあって良い pic.twitter.com/txEW610nAo
— ふじき (@fzkqi) December 26, 2021

VideoToolBoxでH.264にswiftでエンコード、デコード

[事前準備] SCNView の描画結果を AVSampleBufferDisplayLayer で表示する

詳しくはこちらを見てください

描画結果の MTLTexture を受け取って適当に CMSampleBuffer に変換して、AVSampleBufferDisplayLayer に enqueue します
アプリ内であれば直接 MTLTexture を渡せば解決しますが、このままではサイズが非常に大きいので、インターネットなどを経由して受け渡しする事が難しいです
そこで、VideoToolBox を使って、H.264にエンコードしてサイズを小さくします

textureStream
    .compactMap { [weak self] (texture: MTLTexture) -> CMSampleBuffer? in
        return self?.convert(from: texture)
    }
    .sink { [weak self] (sampleBuffer: CMSampleBuffer) in
        self?.sampleBufferDisplayLayer.enqueue(sampleBuffer)
    }
    .store(in: &cancellables)

H.264 でエンコードされた CMSampleBuffer を作成する

VideoToolBox を利用すると、frame が H.264 にエンコードされて、CMSampleBuffer として出力されます
まずは、この CMSampleBuffer を取得して、表示させます
その次に、取得した CMSampleBuffer をシリアライズします

CMSampleBuffer を返す Encoder の実装

Encoder を実装します
CVImageBuffer などを受け取って、H.264 にエンコードして、encodedSampleBuffer に流すことを目的とします

class Encoder {
    public let encodedSampleBuffer: AnyPublisher<CMSampleBuffer, Never>
    private let encodedSampleBufferSubject = PassthroughSubject<CMSampleBuffer, Never>()
    init() {
        encodedSampleBuffer = encodedSampleBufferSubject.eraseToAnyPublisher()
    }
    public func encode(imageBuffer: CVImageBuffer, presentationTimeStamp: CMTime, duration: CMTime) {
    }
}

なんと、AVSampleBufferDisplayLayer が H.264 にエンコードされた CMSampleBuffer を適当に渡してみたら表示してくれたので、まずは Encoder のみを間に挟みます
- AVSampleBufferDisplayLayer すごい
適当に、MTLTexture を CVImageBuffer に変換してます

textureStream
    .compactMap { [weak self] (texture: MTLTexture) -> CVImageBuffer? in
        self?.convert(from: texture)
    }
    .sink { [weak self] (imageBuffer: CVImageBuffer) in
        self?.encoder.encode(imageBuffer: imageBuffer,
                             presentationTimeStamp: self!.currentCmTime,
                             duration: CMTime(value: 16_000_000, timescale: 1_000_000_000))
    }
    .store(in: &cancellables)

encoder
    .encodedSampleBuffer
    .sink { [weak self] encoded in
        self?.sampleBufferDisplayLayer.enqueue(encoded)
    }
    .store(in: &cancellables)

こちら↓が実装した Encoder です
encode 関数が呼ばれると、初回のみ setup を実行して、VTCompressionSession を初期化します
VTCompressionSession を作成する際に、エンコードされた　CMSampleBuffer を受け取る callback を指定します
outputCallback は呼ばれたら、エンコードされた sampleBuffer を encodedSampleBufferSubject に突っ込んでるだけです

class Encoder {
    public let encodedSampleBuffer: AnyPublisher<CMSampleBuffer, Never>
    private let encodedSampleBufferSubject = PassthroughSubject<CMSampleBuffer, Never>()
    init() {
        encodedSampleBuffer = encodedSampleBufferSubject.eraseToAnyPublisher()
    }

    private var session: VTCompressionSession?
    public func encode(imageBuffer: CVImageBuffer, presentationTimeStamp: CMTime, duration: CMTime) {
        if session == nil {
            self.setup(width: Int32(CVPixelBufferGetWidth(imageBuffer)), height: Int32(CVPixelBufferGetHeight(imageBuffer)))
        }
        var infoFlagsOut: VTEncodeInfoFlags = []
        _ = VTCompressionSessionEncodeFrame(session!,
                                            imageBuffer: imageBuffer,
                                            presentationTimeStamp: presentationTimeStamp,
                                            duration: duration,
                                            frameProperties: nil,
                                            sourceFrameRefcon: nil,
                                            infoFlagsOut: &infoFlagsOut)
    }

    private func setup(width: Int32, height: Int32) {
        _ = VTCompressionSessionCreate(allocator: kCFAllocatorDefault,
                                       width: width,
                                       height: height,
                                       codecType: kCMVideoCodecType_H264,
                                       encoderSpecification: nil,
                                       imageBufferAttributes: nil,
                                       compressedDataAllocator: nil,
                                       outputCallback: outputCallback,
                                       refcon: Unmanaged.passUnretained(self).toOpaque(),
                                       compressionSessionOut: &session)
        // set low quality
//        VTSessionSetProperty(session!, key: kVTCompressionPropertyKey_AverageBitRate, value: width * height as CFTypeRef)
    }

    private let outputCallback: VTCompressionOutputCallback = { (outputCallbackRefCon: UnsafeMutableRawPointer?,
                                                                 sourceFrameRefCon: UnsafeMutableRawPointer?,
                                                                 status: OSStatus,
                                                                 infoFlags: VTEncodeInfoFlags,
                                                                 sampleBuffer: CMSampleBuffer?) in
        guard let outputCallbackRefCon = outputCallbackRefCon,
                let sampleBuffer = sampleBuffer else { return }
        let refcon = Unmanaged<Encoder>.fromOpaque(outputCallbackRefCon).takeUnretainedValue()
        refcon.encodedSampleBufferSubject.send(sampleBuffer)
    }
}

H.264 でエンコードされた CMSampleBuffer をデコードする

エンコードされた CMSampleBuffer も直接 AVSampleBufferDisplayLayer で表示できちゃいました
しかし、常に AVSampleBufferDisplayLayer を利用する訳にもいかないので、手動でデコードしてみます

エンコードされた CMSampleBuffer をデコードする

エンコードされた CMSampleBuffer をデコードして、CVImageBuffer を取得し、AVSampleBufferDisplayLayer が表示できるように、CMSampleBuffer に変換します
基本的な戦略は Encoder と同じで、H.264 でエンコードされた CMSampleBuffer をデコードして、decodedSampleBuffer に流します

class Decoder {
    public let decodedSampleBuffer: AnyPublisher<CMSampleBuffer, Never>
    private let decodedSampleBufferSubject = PassthroughSubject<CMSampleBuffer, Never>()
    public init() {
        decodedSampleBuffer = decodedSampleBufferSubject.eraseToAnyPublisher()
    }
    public func decode(sampleBuffer: CMSampleBuffer) {
    }
}

こちらが実装した Decoder です
基本的に、Encoder と同じ感じです
VTDecompressionSessionCreate を使って、VTDecompressionSession を作成し、VTDecompressionSessionDecodeFrame を使って、CMSampleBuffer を突っ込みます
デコードすると、CVImageBuffer と timestamp を取得できるのて適当に CMSampleBuffer にします

class Decoder {
    public let decodedSampleBuffer: AnyPublisher<CMSampleBuffer, Never>
    private let decodedSampleBufferSubject = PassthroughSubject<CMSampleBuffer, Never>()

    public init() {
        decodedSampleBuffer = decodedSampleBufferSubject.eraseToAnyPublisher()
    }

    private var session: VTDecompressionSession?

    public func decode(sampleBuffer: CMSampleBuffer) {
        if session == nil {
            setup(formatDescription: CMSampleBufferGetFormatDescription(sampleBuffer)!)
        }
        var infoFlagsOut: VTDecodeInfoFlags = []
        _ = VTDecompressionSessionDecodeFrame(session!,
                                              sampleBuffer: sampleBuffer,
                                              flags: [],
                                              frameRefcon: nil,
                                              infoFlagsOut: &infoFlagsOut)
    }

    private func setup(formatDescription: CMFormatDescription) {
        var outputCallback = VTDecompressionOutputCallbackRecord(decompressionOutputCallback: decompressionOutputCallback,
                                                                 decompressionOutputRefCon: Unmanaged.passUnretained(self).toOpaque())
       _ = VTDecompressionSessionCreate(allocator: kCFAllocatorDefault,
                                        formatDescription: formatDescription,
                                        decoderSpecification: nil,
                                        imageBufferAttributes: nil,
                                        outputCallback: &outputCallback,
                                        decompressionSessionOut: &session)
    }

    private var decompressionOutputCallback: VTDecompressionOutputCallback = { (decompressionOutputRefCon: UnsafeMutableRawPointer?,
                                                                                sourceFrameRefCon: UnsafeMutableRawPointer?,
                                                                                status: OSStatus,
                                                                                infoFlags: VTDecodeInfoFlags,
                                                                                imageBuffer: CVImageBuffer?,
                                                                                presentationTimeStamp: CMTime,
                                                                                presentationDuration: CMTime) in
        guard let decompressionOutputRefCon = decompressionOutputRefCon else { return }
        let refcon = Unmanaged<Decoder>.fromOpaque(decompressionOutputRefCon).takeUnretainedValue()
        var formatDescription: CMVideoFormatDescription?
        _ = CMVideoFormatDescriptionCreateForImageBuffer(allocator: kCFAllocatorDefault,
                                                         imageBuffer: imageBuffer!,
                                                         formatDescriptionOut: &formatDescription)
        var sampleTiming = CMSampleTimingInfo(duration: presentationDuration,
                                              presentationTimeStamp: presentationTimeStamp,
                                              decodeTimeStamp: .invalid)
        var sampleBuffer: CMSampleBuffer?
        _ = CMSampleBufferCreateForImageBuffer(allocator: kCFAllocatorDefault,
                                               imageBuffer: imageBuffer!,
                                               dataReady: true,
                                               makeDataReadyCallback: nil,
                                               refcon: nil,
                                               formatDescription: formatDescription!,
                                               sampleTiming: &sampleTiming,
                                               sampleBufferOut: &sampleBuffer)
        refcon.decodedSampleBufferSubject.send(sampleBuffer!)
    }
}

H.264 でエンコードされた CMSampleBuffer をシリアライズ、デシリアライズする

これまでの実装では、H.264 でエンコードされた CMSampleBuffer をエンコーダからデコーダに渡していました
CMSampleBuffer のままでは、使いにくい場合があるので、次のモデルに変換します
ここまで変換すれば、protobuf なり好きな形式でインターネットなどを経由させてやりとりできそうです

struct EncodedFrameEntity {
    let sps: [UInt8]
    let pps: [UInt8]
    let body: [Int8]
    let microSec: Int
}

H.264 でエンコードされた CMSampleBuffer をシリアライズ

CMSampleBufferGetHogehoge 的な API を使って CMSampleBuffer から必要なデータを抜き出していきます
CMFormatDescription から sps と pps を取得します
CMVideoFormatDescriptionGetH264ParameterSetAtIndex というそのままな API が用意されてます

private func convert(from sampleBuffer: CMSampleBuffer) -> EncodedFrameEntity? {
    let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)!
    let sps = getH264Parameter(formatDescription: formatDescription, index: 0)!
    let pps = getH264Parameter(formatDescription: formatDescription, index: 1)!
    let body = getData(sampleBuffer: sampleBuffer)!
    let microSec = getTimeStampMicroSec(sampleBuffer: sampleBuffer)
    return EncodedFrameEntity(sps: sps, pps: pps, body: body, microSec: microSec)
}

private func getH264Parameter(formatDescription: CMFormatDescription, index: Int) -> [UInt8]? {
    var ptrOut: UnsafePointer<UInt8>?
    var size: Int = 0
    var count: Int = 0
    var nal: Int32 = 0
    _ = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(formatDescription,
                                                           parameterSetIndex: index,
                                                           parameterSetPointerOut: &ptrOut,
                                                           parameterSetSizeOut: &size,
                                                           parameterSetCountOut: &count,
                                                           nalUnitHeaderLengthOut: &nal)
    let buffer = UnsafeBufferPointer(start: ptrOut!, count: size)
    return Array(buffer)
}

private func getData(sampleBuffer: CMSampleBuffer) -> [Int8]? {
    let blockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer)!
    var offsetOut: Int = 0
    var lengthOut: Int = 0
    var ptrOut: UnsafeMutablePointer<Int8>? = nil
    _ = CMBlockBufferGetDataPointer(blockBuffer,
                                    atOffset: 0,
                                    lengthAtOffsetOut: &offsetOut,
                                    totalLengthOut: &lengthOut,
                                    dataPointerOut: &ptrOut)
    let buffer = UnsafeBufferPointer(start: ptrOut!, count: lengthOut)
    return Array(buffer)
}

private func getTimeStampMicroSec(sampleBuffer: CMSampleBuffer) -> Int {
    let time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
    let microSec: Int = Int(time.seconds * 1_000_000)
    return microSec
}

シリアライズされたデータを CMSampleBuffer に戻す

CMVideoFormatDescriptionCreateFromH264ParameterSets が用意されていたので CMFormatDescription を作成できました
[UInt8].withUnsafeBytes 芸で色々逃げ切ってます
適当に CMSampleBuffer を作ってます

func convert(from frameEntity: EncodedFrameEntity) -> CMSampleBuffer {
    var sampleBuffer: CMSampleBuffer?
    let blockBuffer = makeBlockBuffer(frameEntity: frameEntity)!
    let formatDescription = makeFormatDescription(frameEntity: frameEntity)
    let presentationTimeStamp = CMTime(value: CMTimeValue(frameEntity.microSec), timescale: 1_000_000)
    var timingInfo = CMSampleTimingInfo(duration: .invalid, presentationTimeStamp: presentationTimeStamp, decodeTimeStamp: .invalid)
    var sampleSizeArray: [Int] = [frameEntity.body.count]
    _ = CMSampleBufferCreate(allocator: kCFAllocatorDefault,
                             dataBuffer: blockBuffer,
                             dataReady: true,
                             makeDataReadyCallback: nil,
                             refcon: nil,
                             formatDescription: formatDescription,
                             sampleCount: 1,
                             sampleTimingEntryCount: 1,
                             sampleTimingArray: &timingInfo,
                             sampleSizeEntryCount: 1,
                             sampleSizeArray: &sampleSizeArray,
                             sampleBufferOut: &sampleBuffer)
    return sampleBuffer!
}

private func makeBlockBuffer(frameEntity: EncodedFrameEntity) -> CMBlockBuffer? {
    var blockBuffer: CMBlockBuffer?
    _ = CMBlockBufferCreateWithMemoryBlock(allocator: kCFAllocatorDefault,
                                           memoryBlock: nil,
                                           blockLength: frameEntity.body.count,
                                           blockAllocator: nil,
                                           customBlockSource: nil,
                                           offsetToData: 0,
                                           dataLength: frameEntity.body.count,
                                           flags: 0,
                                           blockBufferOut: &blockBuffer)
    frameEntity.body.withUnsafeBytes { (ptr: UnsafeRawBufferPointer) in
        _ = CMBlockBufferReplaceDataBytes(with: ptr.baseAddress!,
                                          blockBuffer: blockBuffer!,
                                          offsetIntoDestination: 0,
                                          dataLength: frameEntity.body.count)
    }
    return blockBuffer!
}

private func makeFormatDescription(frameEntity: EncodedFrameEntity) -> CMFormatDescription? {
    var formatDescription: CMVideoFormatDescription? = nil
    let parameters: [[UInt8]] = [
        frameEntity.sps,
        frameEntity.pps
    ]
    var parameterSetPointers: [UnsafePointer<UInt8>] = parameters.map { (arr: [UInt8]) in
        let res = UnsafeMutablePointer<UInt8>.allocate(capacity: arr.count)
        arr.withUnsafeBytes { (src: UnsafeRawBufferPointer) in
            _ = memcpy(res, src.baseAddress!, arr.count)
        }
        return UnsafePointer<UInt8>(res)
    }
    let parameterSetSizes: [Int] = parameters.map { $0.count }
    _ = CMVideoFormatDescriptionCreateFromH264ParameterSets(allocator: kCFAllocatorDefault,
                                                            parameterSetCount: parameters.count,
                                                            parameterSetPointers: &parameterSetPointers,
                                                            parameterSetSizes: parameterSetSizes,
                                                            nalUnitHeaderLength: 4,
                                                            formatDescriptionOut: &formatDescription)
    return formatDescription!
}

終わりに

とてもとても長くなりました
WWDC14 の Direct Access to Video Encoding and Decoding が非常に勉強になりました
エンコードされた CMSampleBuffer を使う所までは、トントン拍子に進んだのですが、EncodedFrameEntity への変換がとても難しかったです
- 失敗している時にどこでミスっているか、全然分からなかったです
実装自体はとても長くなりましたが、比較的シンプル？な実装で H.264 へのエンコード、デコードができたと思いました
Combine が入ったので、順番に変換する、入力する、出力を受け取るなどを、ストリームを使って表現しやすくなってて嬉しいです

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

VideoToolBoxでH.264にswiftでエンコード、デコードする

はじめに

VideoToolBoxでH.264にswiftでエンコード、デコード

[事前準備] SCNView の 描画結果を AVSampleBufferDisplayLayer で表示する

H.264 でエンコードされた CMSampleBuffer を作成する

CMSampleBuffer を返す Encoder の実装

H.264 でエンコードされた CMSampleBuffer をデコードする

エンコードされた CMSampleBuffer をデコードする

H.264 でエンコードされた CMSampleBuffer をシリアライズ、デシリアライズする

H.264 でエンコードされた CMSampleBuffer をシリアライズ

シリアライズされたデータを CMSampleBuffer に戻す

終わりに

[事前準備] SCNView の描画結果を AVSampleBufferDisplayLayer で表示する