More than 5 years have passed since last update.

AVFoundationを使った動画編集(簡単なフェード処理)

Last updated at 2019-11-13Posted at 2019-09-16

AVVideoComposition, AVVideoCompositionInstruction、そしてAVVideoCompositionLayerInstructionを使って、映像にフェード処理を施してみたいと思います。

概念の整理

超簡単な説明。Mutableについては同じなので省略。

クラス・プロトコル	説明
AVComposition	出力用の映像と音全部ひっくるめたオブジェクト
AVVideoComposition	出力用のCompositionのうち、映像に関する部分を構成するオブジェクト
AVVideoCompositionInstruction	VideoCompositionに対して、特定の範囲(X秒目〜Y秒目)にどういう映像を出力すれば良いかを指定するオブジェクト。各出力用の映像トラックをどう重ねて表示するかをAVVideoCompositionLayerInstructionの配列で指定する。
AVVideoCompositionLayerInstruction	VideoCompositionに含まれる各映像トラックに対して、位置や透明度(フェード)をどのように変化させるかを指定するオブジェクト

Compositionについては前の投稿を参照してください。

VideoCompositionを作る

元映像ファイルを読み込む

let asset = AVURLAsset(url: url)

まずは元となる動画を読み込みます。

出力用のCompositionを作る

let composition = AVMutableComposition()
guard let videoTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid) else {
    debugPrint("Failed to add video track")
    return
}

出力用のCompositionを作成して、そこにビデオトラックを一つ追加します。わかりやすくするため、今回は音声トラックについて省略します。

元映像の最初の5秒を取り出して、出力用のビデオトラックに追加する

// 素材Assetの1個目のVideoトラックを使う
let srcVideoTrack = asset.tracks(withMediaType: .video)[0]
let first5Seconds = CMTimeRange(start: .zero, end: CMTime(seconds: 5.0, preferredTimescale: srcVideoTrack.naturalTimeScale))
do {
    // 0秒のタイミングに、映像全体を追加する
    try videoTrack.insertTimeRange(first5Seconds, of: srcVideoTrack, at: .zero)
} catch let error {
    debugPrint(error)
}

元動画のアセットの映像から、最初の5秒間を取り出して出力用のビデオトラックに追加します。※元動画の映像が複数トラックに分かれている場合、問題が起きる可能性がありますが、わかりやすさのため省略します。

出力用ビデオトラックの冒頭1秒にフェードをかけるLayerInstructionを作る

// 出力用ビデオトラックに対するlayerInstruction
let layerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: videoTrack)

// 0.0秒-1.0秒の間で透明度(opacity)を0.0から1.0に変化させる(フェードイン)
layerInstruction.setOpacityRamp(fromStartOpacity: 0.0, toEndOpacity: 1.0, timeRange: CMTimeRange(start: .zero, end: CMTime(seconds: 1.0, preferredTimescale: srcVideoTrack.naturalTimeScale)))

出力用ビデオトラックにかけるlayerInstructionを作成します。**AVMutableVideoCompositionLayerInstructionのassetTrackは出力用ビデオトラックを指定します。**元映像のビデオトラックではないので注意しましょう。

AVVideoCompositionLayerInstructionでは、トランスフォーム、クロップ、透明度の変更などの指示を記述することができますが、ここでは0.0秒〜1.0秒の間に透明度(opacity)を0.0から1.0にフェードさせるような処理を指定しました。

preferredTimescaleは出力映像用のTimescaleにした方が効率が良いのですが、コードが若干読みにくくなるのでここでは元映像のTimeScaleをを使用しています。どうでも良いですが、AppleのAPIの中でもTimescaleとTimeScaleが混在しているのですね…。

正しいオリエンテーションにする

iPhoneを縦向きにして撮影した動画など一部のファイルに関しては、動画が回転してしまうことがあります。iPhoneで縦向きで撮影した動画のnaturalSizeを見てみると、下記のような値になっています。

debugPrint(srcVideoTrack.naturalSize) // (1280, 720)

これは、動画自体は横長の映像として保存されていて、再生時に然るべきオリエンテーションに座標変換される仕組みになっているからです。AVVideoCompositionで手動で動画ファイルから映像を切り出してコピーするだけだと、この変換の処理が抜けてしまうため、正しい向きで動画が再生されないという問題が起きてしまいます。

layerInstruction.setTransform(srcVideoTrack.preferredTransform, at: .zero)

対処方法はいくつか考えられますが、元映像に関する座標変換の情報はpreferredTransformプロパティから取得できるので、layerInstruction.setTransformでこの値を指定してあげると、正しい向きに座標変換が適用された状態で動画がコンポーズされます。

その場合は、後述の出力サイズの指定にも注意します。

出力映像全体の流れを指定するInstructionを作る

// VideoCompositionに対するinstruction
let instruction = AVMutableVideoCompositionInstruction()
instruction.layerInstructions = [layerInstruction]

// instructionは元映像全体に適用されるようにする
instruction.timeRange = videoTrack.timeRange

次に、出力用のVideoCompositionにかけるinstructionを用意します。instructionのlayerInstructionsには複数のAVVideoCompositionLayerInstructionを指定できるようになっています。複数のビデオトラックをクロスフェードさせたり、重ねて表示(PiP)させたい場合には、各トラック用のlayerInstructionを作成して重ねて指定します。今回は一つしかトラックがないので、layerInstructionも先ほど作成したものだけを指定しています。

instructionの影響範囲(timeRange)は出力用映像トラックの再生範囲と同じ範囲を指定して、全体に効果が及ぶようにします。ここで、instructionの影響範囲とトラックの再生範囲が一致しないとinstructionが無効化されてしまうので注意しましょう。

補足: instructionのtimeRangeに関する注意

AVMutableVideoCompositionInstructionのtimeRangeプロパティの説明には下記のような記述があります。

If the time range is invalid, the video compositor will ignore it.

invalidの詳細についてはドキュメント上に記述を見つけられませんでしたが、検証してみた限りではVideoTrackの長さと全てのinstructionのtimeRangeの合計の値が正確に一致しないとinvalidと判断されるようです。なので、複数のinstructionを指定する場合は、その影響範囲の合計がトラック全体の再生範囲と完全に一致するように注意しましょう。

出力映像全体の構成を指定するVideoCompositionを作る

// VideoComposition
let videoComposition = AVMutableVideoComposition()
videoComposition.frameDuration = CMTime(value: 1, timescale: 30)
videoComposition.instructions = [instruction]

最後に、映像全体の構成を表すVideoCompositionを作成します。1フレームあたりの長さすなわちFPSを決めるframeDurationには固定で1/30を指定しています。

そして、映像全体にかけるinstructionsには先ほど上で作成したinstructionを指定しています。タイミングによってinstructionを切り替えたい場合は複数のinstructionを指定することもできます。

let srcVideoTrackTransformedSize = srcVideoTrack.naturalSize.applying(srcVideoTrack.preferredTransform)
let srcVideoTrackRenderSize = CGSize(width: abs(srcVideoTrackTransformedSize.width), height: abs(srcVideoTrackTransformedSize.height))

videoComposition.renderSize = srcVideoTrackRenderSize

映像の解像度を決めるrenderSizeは元映像の解像度には、preferredTransformの座標変換を適用したものを使用します。前述の通り、iPhoneで縦方向で撮影した動画ファイルのnaturalSizeには座標変換適用前の横長の動画サイズが含まれていますが、layerInstruction.setTransformで座標変換を適用する場合は、適用後の画面サイズにしないとvideoTrackの描画サイズと合わなくなってしまうためです。

srcVideoTrack.naturalSize.applying(srcVideoTrack.preferredTransform)でpreferredTransformを適用したサイズは、縦幅・横幅がマイナスになる場合があるので、それぞれabsを使ってプラスにしたものを最終的な出力サイズとします。

Playerで確認する

let playerItem = AVPlayerItem(asset: composition)
playerItem.videoComposition = videoComposition

let player = AVPlayer(playerItem: playerItem)
playerView.player = player

ファイルに出力する前にAVPlayerViewでプレビューできるようにしておきます。
AVPlayerItemのvideoCompositionに今回作成したVideoCompositionを指定すると、そのVideoCompositionから構成された映像をAVPlayerViewで確認できます。

SwiftUIを使って確認する場合の方法についてはこちら

ファイルに出力する

guard let session = AVAssetExportSession(asset: composition, presetName: AVAssetExportPreset960x540) else {
    debugPrint("Failed to prepare session")
    return
}
session.videoComposition = videoComposition
session.outputURL = url
session.outputFileType = .mp4
session.exportAsynchronously {
    switch session.status {
    case .completed:
        debugPrint("completed")
    case .failed:
        debugPrint("error: \(session.error!.localizedDescription)")
    default:
        break
    }
}

ファイルに出力するときは、AVAssetExportSessionのvideoCompositionに出力用のVideoCompositionを指定するのを忘れないようにしましょう。

次のステップ

AVFoundationを使った動画編集(2つの動画をフェードで重ねる)

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up