More than 3 years have passed since last update.

AVFoundationをがっつり使ってすっかりAV芸人になった回

Last updated at 2021-03-17Posted at 2021-03-17

畑田です。
ある開発においてがっつりAVFoundation芸人にならねばならなかったのでAppleのAVFoundationのドキュメントを読んで分かったことをまとめてみます。

captureについて

AVFoundationのcaptureというサブシステムはビデオ、写真、音声を撮影、録音するためのものであり、以下のときに使用されうる。

アプリに自前のカメラ機能を実装するとき
動画や写真を撮影する際にフォーカスや露光、安定性などについて、ユーザーにより直接的に操作させたいとき
撮影した写真やdepth maps(なにこれ訳せやん)の保存形式を指定したり、ビデオのメタデータをカスタムしたりしたいとき
撮影、あるいは録音デバイスから直接ストリーミングさせたいとき

もし、ユーザーにネイティブのカメラアプリを使用させたい場合はUIImagePickerControllerを用いれば良い点に注意すること。
captureの主要なパーツはsession, inputs, outputsであり、sessionは一つ以上のinputと一つ以上のoutputをつなげてくれる。

AVCaptureSession

AVCaptureSessionはcaptureにおけるinputからoutputへのデータの流れを管理するためのクラスである。
AVCaptureSessionをインスタンス化し、適当なinputとoutputを追加すると、startRunning()でcaptureを開始でき、stopRunning()でcaptureを終了させられる。
画質などの設定はsessionPresetというpropertyを用いることで可能であるが、フレームレートの高度な調整などより詳しい設定に関してはAVCaptureDeviceに直接指定してやる必要がある。
これを参考に色々なプラクティスを実装してみて練習する余地はありそうかもしれない。

AVCaptureConnection

AVCaptureConnectionはsessionにおいて特定のinputとoutputをつなげる役割を果たすクラスである。
AVCaptureSessionのaddInput(_:)やaddOutput(_:)を使えば、自動でinputとoutputの接続は形成される。
しかし、接続したくないinputとoutputがある場合や接続ごとの有効化などを設定したい場合、またはビデオの状態を取得したり設定したりしたい場合はこのクラスを用いることが必要となる。
videoOrientationに値を指定すれば、ビデオの表示方向を指定できる。
このクラスはインスタンス化したのち、AVCaptureSessionに追加することでsessionに反映される。

AVCaptureMovieFileOutput

AVCaptureMovieFileOutputはcaptureした映像と音声をMOV形式で記録するためのクラスである。
このクラスはMOV形式への保存にのみ対応しており、MPEG-4などその他のファイル形式で記録することはできない。
setOutputSettings([String : Any]?, for: AVCaptureConnection)を用いて設定を追加できるようである。この引数の辞書型のものには[AVVideoCodecKey: AVVideoCodecType.h264]などが指定できる。詳しくはこちらを参照されたい。

動画再生について

AVAsset

AVAssetは映像トラックや音声トラックなどのデジタル素材を定義するためのクラスであり、そのメディアの所在を示すurlを用いて初期化する。
(トラックとは、、、磁気テープやディスクなどの記録メディアに映像や音声の信号を記録する帯状の領域のこと。実際にはデータが断続しているディジタル信号の記録でも、イメージとしては各信号ごとに帯状の領域となっている。)
ただし、AVAssetは抽象的なクラスであり、インスタンス化すると実際にはサブクラスであるAVURLAssetが返される。ちなみにAVAssetの直接のサブクラスはAVURLAssetとAVCompositionの2つである。

let url: URL = // Local or Remote Asset URL
let asset = AVAsset(url: url)
print(type(of: asset)) // AVURLAsset

AVURLAssetはAVAssetをインスタンス化するのではなく、直接インスタンス化でき、その際には、live streamingやセルラーデータを使っては通信しないなどの詳しいoptionsを辞書型で与えることができる。

let url: URL = // Remote Asset URL
let options = [AVURLAssetAllowsCellularAccessKey: false]
let asset = AVURLAsset(url: url, options: options)

メディアを再生するにはAVPlayerItemにAVURLAssetを渡してインスタンス化し、再生時間制限などの設定をして、それを渡すことでAVPlayerをインスタンス化する。

AVPlayerItem

AVPlayerItemはAVAssetインスタンスをへの参照を保持する、つまり再生したいメディアを管理するオブジェクトへのアクセスをする役割がある。
初期化時に必要なアセットデータを記述した配列を渡すと自動で必要なアセットを読み込んで使えるようにしてくれる。
AVPlayerItemのpropertyは変更できるし、read-only propertyでさえAVPlayerからは自動で変更されるが、その中でもstatus propertyは重要である。このpropertyの値がAVPlayerItem.Status.readyToPlayになったときに再生できるため、これを観察する手を打つべきである。
playerにplayerItemを渡す前にaddObserver(_:forKeyPath:options:context:)を呼び、その変更をobserveValue(forKeyPath:of:change:context:)によって取得するのが良い。
なお、statuspropertyがfailedの場合はAVPlayerItemのerrorに詳細があるはずである。

func prepareToPlay() {
    let url = <#Asset URL#>
    // Create asset to be played
    asset = AVAsset(url: url)
    
    let assetKeys = [
        "playable",
        "hasProtectedContent"
    ]
    // Create a new AVPlayerItem with the asset and an
    // array of asset keys to be automatically loaded
    playerItem = AVPlayerItem(asset: asset,
                              automaticallyLoadedAssetKeys: assetKeys)
    
    // Register as an observer of the player item's status property
    playerItem.addObserver(self,
                           forKeyPath: #keyPath(AVPlayerItem.status),
                           options: [.old, .new],
                           context: &playerItemContext)
    
    // Associate the player item with the player
    player = AVPlayer(playerItem: playerItem)
}

AVPlayer

AVPlayerはメディアファイルを再生するためのクラスであり、同時に一つの素材(media asset)しか扱えない。
異なる素材を連続して再生したい場合は、NotificationCenterに再生の終わりを通知させてreplaceCurrentItem(with:)メソッドを用いてアイテムを挿げ替えるか、AVQueuePlayerで代用することを検討する。
timeControlObserver propertyで再生中か一時停止中華などの情報を得られる。
AVAssetは作成時刻やトラックの長さなど静的な情報しか持っていないため、動的な状態を保持できるAVPlayerItemを渡すことで再生できる。
AVPlayerもまた動的なオブジェクトであり、その変わりゆく状態を取得するには以下の二つの方法を併用することが必要である。

KVO(key-value observing)を使ってcurrentItemやrateなどの変化を取得するというもの
KVOは再生時間などの絶えず変化している値の取得には向いていないため、addPeriodicTimeObserver(forInterval:queue:using:)やaddBoundaryTimeObserver(forTimes:queue:using:)を用いてその変化を取得するというもの

後者では、値をpriodicallyまたはby boundaryに観察し、値が変化するごとにコールバック関数が発火しインターフェイスを更新するなどの記述が可能である。

動画編集について

AVComposition

AVCompositionは複数のソースからなるデータを重ね合わせるためのクラスであり、AVAssetのサブクラスです。
AVCompositionは音声や映像といったそれぞれのメディアを意味するトラック(AVCompositionTrackというAVAssetTrackを継承したクラスで表される)という単位の集合です。やっぱりAVAssetの友達。
またそれぞれのトラックはsegment(AVCompositionTrackSegment)によって構成されており、これはURL、track identifier、time mappingなどでコンテナに保存されている各メディアデータを表しています。
AVCompositionはAVAssetのサブクラスであることからなんとなくわかるかもしれませんが、AVPlayerなどに渡して再生することができます。
また、ファイルで表せる全てのAVデータはコンテナ形式によらず、合成することができます。
time mappingはメディアの長さを管理しており、元データと保存先(合成先)データのタイムレンジが等しければそのまま保存し、異なれば等倍で伸ばして保存するなどするということです。
こういったtrackやsegmentには簡単にアクセスできて書き出すこともできるし、AVMutableCompositionTrackを用いればAVMutableCompositionを新しくインスタンス化することで、compositionを再構築、新規作成することもできます。
AVMutableCompositionTrackやAVMutableCompositionを用いれば、メディアの挿入や削除、スケーリングなどの操作を高レイヤーにおいて行うことができます。
特に、動画編集、compositionの新規作成については以下の図表を参照してください。

name	class	description
Composition	`AVMutableComposition`	一つの映像作品 / プロジェクト
Track	`AVMutableCompositionTrack`	Compositionを構成する映像/音声トラック
Asset	`AVAsset`	Compositionの各トラックにのせるトラックを持つ素材

少しソースコード書いてみます。

// create an empty composition
let mutableComposition = AVMutableComposition()

// add an empty video track to the composition
guard let compositionVideoTrack = mutableComposition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid) else {
    print("failed to add video track")
    return
}

// add an empty audio track to the composition
guard let compositionAudioTrack = mutableComposition.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid) else {
    print("failed to add audio track")
    return
}

// add a video track of asset to the composition video track
let videoAsset = AVURLAsset(url: someURL)
guard let _ = videoAsset.tracks(withMediaType: .video).first else { return }
let sourceVideoTrack = videoAsset.tracks(withMediaType: .video)[0]
do {
    try compositionVideoTrack.insertTimeRange(sourceVideoTrack.timeRange, of: sourceVideoTrack, at: .zero)
} catch {
    print(error)
}

// add a audio track of asset to the composition audio track
let audioAsset = AVURLAsset(url: someURL)
guard let _ = audioAsset.tracks(withMediaType: .audio).first else { return }
let sourceAudioTrack = audioAsset.tracks(withMediaType: .audio)[0]
do {
    try compositionAudioTrack.insertTimeRange(sourceAudioTrack.timeRange, of: sourceAudioTrack, at: .zero)
} catch {
    print(error)
}

// create export session
guard let session = AVAssetExportSession(asset: mutableComposition, presetName: AVAssetExportPresetPassthrough) else {
    print("failed to prepare session")
    return
}

// set up output file
session.outputURL = urlToWriteTo
session.outputFileType = .mp4

// export the composition
session.exportAsynchronously {
    switch session.status {
    case .completed:
        print("completed")
    case .failed:
        print("export error: \(session.error!.localizedDescription)")
    default:
        break
    }
}

composition trackにassetのtrackを載せるコードのdo構文の中を以下のようにしてあげると、時間の設定を変えられます。

let firstFiveSeconds = CMTimeRange(start: .zero, end: CMTime(seconds: 5.0, preferredTimescale: sourceVideoTrack.naturalTimeScale))
try compositionVideoTrack.insertTimeRange(firstFiveSeconds, of: sourceVideoTrack, at: .zero)

AVMutableVideoCompositionInstruction, AVMutableVideoCompositionLayerInstruction

実際にトラックの操作(透過、移動、クロップ)を設定するにはこちらのクラスを利用します。
それらの操作の開始時間、継続時間も同時に設定します。
動画を撮影したときの向きを取得して、それに合わせてビデオトラックを回転させることもできます。(下のソースコードのlayerInstruction.setTransform(sourceVideoTrack.preferredTransform, at: .zero)の部分を参照してください。)
上の図では1つのAVMutableCompositionTrackに対して1つのAVMutableVideoCompositionLayerInstructionを使用しているイメージとなります。
しかし、1つのAVMutableCompositionTrackに対して複数のAVMutableVideoCompositionLayerInstructionを構築することも可能ですので、より複雑な編集も効率よく行えます。

AVMutableVideoComposition

AVMutableCompositionと名前は似ていますが、こちらのクラスではAVMutableCompositionに対する付加情報を設定していきます。
具体的には、フレームの長さやレンダリングサイズ、そして先ほどでてきたトラックの操作(AVMutableVideoCompositionInstruction)です。
このAVMutableVideoCompositionを、AVMutableCompositionとともにAVPlayerやAVAssetExportSessionのクラスに渡す事でビデオに付加情報がセットされます。
このように説明だけしても、使い方が不明です、、、となりそうなのでまたソースコードを書いてみます。
先ほどのソースコードで生成したexport sessionであるsessionで出力を実行する前に、videoCompositionインスタンスを渡しています。

// create mutable video composition which mainly decides frame duration, render size and instruction
let videoComposition: AVMutableVideoComposition = AVMutableVideoComposition()

// set up frame size and render size
videoComposition.frameDuration = CMTimeMake(value: 1, timescale: 30)
videoComposition.renderSize = sourceVideoTrack.naturalSize

// create mutable video composition instruction
let instruction: AVMutableVideoCompositionInstruction = AVMutableVideoCompositionInstruction()

// set time range in which this instruction is active
instruction.timeRange = CMTimeRangeMake(start: CMTime.zero, duration: mutableComposition.duration)

// create and set up layer instruction
let layerInstruction: AVMutableVideoCompositionLayerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: compositionVideoTrack)
layerInstruction.setTransform(sourceVideoTrack.preferredTransform, at: .zero) // rotate video here
instruction.layerInstructions = [layerInstruction]
videoComposition.instructions = [instruction]

// set mutable video composition in export session
session.videoComposition = videoComposition

// export below...

背景を挿入してみる

AVMutableVideoCompositionでは動画の上にアニメーションや字幕、動画と画像、動画と動画の重ね合わせを設定することもできます。
CALayerを用いることで映像に階層構造を作っています。
以下では映像の上に背景画像(静止画)を載せています。
これを上のソースコードの、sessionにvideo compositionを渡す前に書いてあげれば良いという感じです。

// get video size
let videoSize: CGSize = videoTrack.naturalSize
        
// create parent layer
let parentLayer: CALayer = CALayer()
parentLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)

// create videp layer which will be associated with composition video track
let videoLayer: CALayer = CALayer()
videoLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)

// add video layer to parent layer and attach video contents to video layer
parentLayer.addSublayer(videoLayer)
videoComposition.animationTool = AVVideoCompositionCoreAnimationTool(postProcessingAsVideoLayer: videoLayer, in: parentLayer)

// if original video selected, `backgroundImage` is nil and background layer shouldn't be created or added to parent layer
if let _ = backgroundImage {
    // create background layer
    let backgroundLayer: CALayer = CALayer()
    backgroundLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)
    backgroundLayer.opacity = 1.0
    backgroundLayer.masksToBounds = true
    backgroundLayer.backgroundColor = UIColor.clear.cgColor
    backgroundLayer.contentsGravity = CALayerContentsGravity.resizeAspectFill
    
    // add contents to background layer
    backgroundLayer.contents = backgroundImage!.cgImage
    
    // add background layer over video layer
    parentLayer.addSublayer(backgroundLayer)
    
    // make video layer clear to lighten video
    videoLayer.opacity = 0
}

自作の字幕を挿入してみる

開発中の製品の売りは歌詞入れ機能！ということで歌詞の入れ方も記録しておきます。
実際のプロダクトのコードです。
mergeMovie()メソッドを呼ぶと歌詞のついた動画がアプリのtmpディレクトリとフォトライブラリに保存されるようになっています。
ここでlyricDataプロパティは前の画面から引き継いでいるもので、型はDictionary<String, Any>ですが、実際には["id": Int, "lyric": String, "start_at": CMTime, "end_at": CMTime]というような形式です。
getFromTmp(file:)メソッドは独自に定義したものなので完全コピペでは走りません。

ソースコード

private func mergeMovie() {
    // confirm source video asset is not nil
    let asset = AVURLAsset(url: movieURL)
    print(movieURL.path)
    
    // extract video track from asset
    guard let _ = asset.tracks(withMediaType: .video).first else { return print("video track not found") }
    let videoTrack = asset.tracks(withMediaType: AVMediaType.video)[0]
    
    // extract audio track from asset
    guard let _ = asset.tracks(withMediaType: .audio).first else { return print("audio track not found") }
    let audioTrack = asset.tracks(withMediaType: AVMediaType.audio)[0]
    
    // create empty base composition
    let mutableComposition: AVMutableComposition = AVMutableComposition()
    
    // create empty composition video and audio tracks
    let compositionVideoTrack: AVMutableCompositionTrack! = mutableComposition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)
    let compositionAudioTrack: AVMutableCompositionTrack! = mutableComposition.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid)
    
    // insert source video track to the composition video track
    do {
        try compositionVideoTrack.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: asset.duration), of: videoTrack, at: CMTime.zero)
    } catch {
        print("insert video track error:", error)
    }
    
    // insert source audio track to the composition audio track
    do {
        try compositionAudioTrack.insertTimeRange(CMTimeRangeMake(start: CMTime.zero, duration: asset.duration), of: audioTrack, at: CMTime.zero)
    } catch {
        print("insert audio track error:", error)
    }
    // 回転方向の設定
    let preferredTransform = videoTrack.preferredTransform
    //        compositionVideoTrack.preferredTransform = preferredTransform // not effective
    
    // create eport session with base composition
    _assetExportSession = AVAssetExportSession(asset: mutableComposition, presetName: AVAssetExportPresetMediumQuality)
    
    // create mutable video composition which mainly decides duration, render size and instruction
    let videoComposition: AVMutableVideoComposition = AVMutableVideoComposition()
    videoComposition.renderSize = videoTrack.naturalSize
    videoComposition.frameDuration = CMTimeMake(value: 1, timescale: 30)
    
    let instruction: AVMutableVideoCompositionInstruction = AVMutableVideoCompositionInstruction()
    instruction.timeRange = CMTimeRangeMake(start: CMTime.zero, duration: mutableComposition.duration)
    let layerInstruction: AVMutableVideoCompositionLayerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: compositionVideoTrack)
    layerInstruction.setTransform(preferredTransform, at: .zero) // required!
    instruction.layerInstructions = [layerInstruction]
    videoComposition.instructions = [instruction]
    
    let videoSize: CGSize = mutableComposition.naturalSize
    
    // create lyric layer
    let lyricLayer = self.makeLyricLayer(for: mutableComposition)
    
    // create parent layer
    let parentLayer: CALayer = CALayer()
    parentLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)
    
    let videoLayer: CALayer = CALayer()
    videoLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)
    
    parentLayer.addSublayer(videoLayer)
    parentLayer.addSublayer(lyricLayer)
    videoComposition.animationTool = AVVideoCompositionCoreAnimationTool(postProcessingAsVideoLayer: videoLayer, in: parentLayer)
    
    // set mutable video composition in export session
    _assetExportSession?.videoComposition = videoComposition
    
    // set up export session
    exportURL = FileManager.getFromTmp(file: "completion_movie.mov")
    _assetExportSession?.outputFileType = AVFileType.mov
    _assetExportSession?.outputURL = exportURL
    _assetExportSession?.shouldOptimizeForNetworkUse = true
    
    // export
    _assetExportSession?.exportAsynchronously(completionHandler: {() -> Void in
        if self._assetExportSession?.status == AVAssetExportSession.Status.failed {
            print("failed:", self._assetExportSession?.error ?? "error")
        }
        if self._assetExportSession?.status == AVAssetExportSession.Status.completed {
            // save to photo library
            PHPhotoLibrary.shared().performChanges({
                PHAssetChangeRequest.creationRequestForAssetFromVideo(atFileURL: self.exportURL)
            })
            print("saved to \(self._assetExportSession!.outputURL!.path)")
        }
    })
}

private func makeLyricLayer(for mutableComposition: AVMutableComposition) -> CALayer {
    let videoSize = mutableComposition.naturalSize
    
    // create parent layer
    let lyricLayer: CALayer = CALayer()
    lyricLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)
    lyricLayer.opacity = 1.0
    lyricLayer.backgroundColor = UIColor.clear.cgColor
    lyricLayer.masksToBounds = true
    
    // prepare animation
    let frameAnimation: CAKeyframeAnimation = CAKeyframeAnimation(keyPath: "contents")
    frameAnimation.beginTime = AVCoreAnimationBeginTimeAtZero // attention! apple recommends
    frameAnimation.duration = CMTimeGetSeconds(mutableComposition.duration)
    frameAnimation.repeatCount = 1
    frameAnimation.autoreverses = false
    frameAnimation.isRemovedOnCompletion = false // apple recommends
    frameAnimation.fillMode = CAMediaTimingFillMode.forwards
    frameAnimation.calculationMode = CAAnimationCalculationMode.discrete
    
    // set up key times
    var imageKeyTimes: Array<NSNumber> = []
    let frameCount = Int(frameAnimation.duration * 30) // duration [s] * frame rate [/s] = total frame count
    for i in 0 ... frameCount {
        imageKeyTimes.append((Double(i)/Double(frameCount)) as NSNumber)
    }
    frameAnimation.keyTimes = imageKeyTimes
    
    // set up values
    var imageValues: Array<CGImage> = []
    for currentFrame in 0 ... frameCount {
        // begin rendering setting
        UIGraphicsBeginImageContext(videoSize)
        // get position of crrent frame in total video length
        let ratio = Double(currentFrame) / Double(frameCount)
        
        for (index, lyricDatum) in lyricData.enumerated() {
            let startTime = lyricDatum["start_at"] as! CMTime // time at which the lyric appears
            let endTime = lyricDatum["end_at"] as! CMTime // time at which lyric disappears
            // if current frame is between start time and end time, render lyric label
            if CMTimeGetSeconds(startTime) / CMTimeGetSeconds(mutableComposition.duration) < ratio && ratio < CMTimeGetSeconds(endTime) / CMTimeGetSeconds(mutableComposition.duration) {
                let lyric = lyricDatum["lyric"] as! String
                let attributedString = NSMutableAttributedString(string: lyric, attributes: [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 36, weight: UIFont.Weight(rawValue: 1)), NSAttributedString.Key.foregroundColor: UIColor.white])
                let label = UILabel()
                label.frame.size = videoSize
                label.textAlignment = .center
                label.numberOfLines = 2
                label.clipsToBounds = true
                label.allowsDefaultTighteningForTruncation = true
                label.attributedText = attributedString
                label.drawText(in: CGRect(x: lyricLayer.bounds.origin.x, y: lyricLayer.bounds.maxY * 2 / 3, width: videoSize.width, height: videoSize.height / 3))
            }
            // else render nothing
        }
        
        let lyricImage = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        
        guard let _ = lyricImage?.cgImage else { continue }
        imageValues.append(lyricImage!.cgImage!)
    }
    frameAnimation.values = imageValues
    
    // add animation to lyric layer
    lyricLayer.add(frameAnimation, forKey: nil)
    
    return lyricLayer
}

このコードだと重すぎたので、滑らかに動くアニメーションを挿入するのであれば別ですが、字幕を入れるだけであれば、下のコードの方が良いです。

追記ソースコード

private func makeLyricLayer(for mutableComposition: AVMutableComposition) -> CALayer {
    let videoSize = mutableComposition.naturalSize
    
    // create parent layer
    let lyricLayer: CALayer = CALayer()
    lyricLayer.frame = CGRect(x: 0, y: 0, width: videoSize.width, height: videoSize.height)
    lyricLayer.opacity = 1.0
    lyricLayer.backgroundColor = UIColor.clear.cgColor
    lyricLayer.masksToBounds = true
    
    // prepare animation
    let frameAnimation: CAKeyframeAnimation = CAKeyframeAnimation(keyPath: "contents")
    frameAnimation.beginTime = AVCoreAnimationBeginTimeAtZero // attention! apple recommends
    frameAnimation.duration = CMTimeGetSeconds(mutableComposition.duration)
    frameAnimation.repeatCount = 1
    frameAnimation.autoreverses = false
    frameAnimation.isRemovedOnCompletion = false // apple recommends
    frameAnimation.fillMode = CAMediaTimingFillMode.forwards
    frameAnimation.calculationMode = CAAnimationCalculationMode.discrete
    
    // set up key times
    var imageKeyTimes: Array<NSNumber> = []
    
    // set up values
    var imageValues: Array<CGImage> = []
    
    // create transparent image
    UIGraphicsBeginImageContext(videoSize)
    guard let cgEmptyImage = UIGraphicsGetImageFromCurrentImageContext()?.cgImage else { return lyricLayer }
    UIGraphicsEndImageContext()
    
    lyricData.sort() { d0, d1 in
        let startTime0 = d0["start_at"] as! CMTime
        let startTime1 = d1["start_at"] as! CMTime
        return startTime0 < startTime1
    }
    
    for (index, lyricDatum) in lyricData.enumerated() {
        let startTime = lyricDatum["start_at"] as! CMTime // time at which the lyric appears
        let endTime = lyricDatum["end_at"] as! CMTime // time at which lyric disappears
        
        if startTime == endTime { continue }
        
        if index == 0, startTime != .zero {
            imageKeyTimes.append(0)
            imageValues.append(cgEmptyImage)
        }
        
        // if current frame is between start time and end time, render lyric label
        imageKeyTimes.append(NSNumber(value: CMTimeGetSeconds(startTime) / CMTimeGetSeconds(mutableComposition.duration)))
        imageKeyTimes.append(NSNumber(value: CMTimeGetSeconds(endTime) / CMTimeGetSeconds(mutableComposition.duration)))
        
        // begin rendering
        UIGraphicsBeginImageContext(videoSize)
                    
        let lyric = lyricDatum["lyric"] as! String
        let attributedString = NSMutableAttributedString(string: lyric, attributes: [NSAttributedString.Key.font: UIFont.systemFont(ofSize: 24), NSAttributedString.Key.foregroundColor: UIColor.white])
        let label = UILabel()
        label.frame.size = videoSize
        label.textAlignment = .center
        label.numberOfLines = 2
        label.clipsToBounds = true
        label.allowsDefaultTighteningForTruncation = true
        label.attributedText = attributedString
        label.drawText(in: CGRect(x: lyricLayer.bounds.origin.x, y: lyricLayer.bounds.maxY * 2 / 3, width: videoSize.width, height: videoSize.height / 3))
        
        guard let cgLyricImage = UIGraphicsGetImageFromCurrentImageContext()?.cgImage else {
            UIGraphicsEndImageContext()
            imageValues.append(cgEmptyImage)
            imageValues.append(cgEmptyImage)
            continue
        }
        
        // end rendering
        UIGraphicsEndImageContext()
        
        imageValues.append(cgLyricImage)
        imageValues.append(cgEmptyImage)
    }
    
    // set the last key time and the last value
    imageKeyTimes.append(1)
    imageValues.append(cgEmptyImage)
    
    // set key times
    frameAnimation.keyTimes = imageKeyTimes
    print(imageKeyTimes)
    
    // set values
    frameAnimation.values = imageValues
    print(imageValues)
    
    // add animation to lyric layer
    lyricLayer.add(frameAnimation, forKey: nil)
    
    return lyricLayer
}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up