More than 3 years have passed since last update.

Pythonで音付き動画を作る方法

Posted at 2021-07-14

Pythonで音付き動画を作成

Pythonで音付きの動画を作成する方法です。なかなか調べても出てこなかったので、ここでまとめておきます。

候補1: opencv

pythonで動画や画像を扱うとき最も一般に使われるものは、おそらくopencv(cv2)だと思います。しかし、どうやらopencv単体では音声付の動画を作成することはできなさそうです。

候補2: pyav

音付きの動画から、以下のコードで音声つき動画を作成することが出来ました。
以下は公式サンプルを少し改変して音声付き動画を作るように変更したものです。

import av
import av.datasets
import numpy as np


input_ = av.open("./videos/01_smile.mp4")
output = av.open('./videos/sample02.mp4', 'w')

# Make an output stream using the input as a template. This copies the stream
# setup from one to the other.
in_stream = input_.streams.get(audio=0, video=0) 
out_stream_video = output.add_stream("h264", rate=30, width=1920, height=1080)
out_stream_audio = output.add_stream("aac", rate=44100, layout="stereo")
audio_first_flag = True
for i, packet in enumerate(input_.demux(in_stream)):

   # We need to skip the "flushing" packets that `demux` generates.
   if packet.dts is None:
       continue

   # We need to assign the packet to the new stream.
   if packet.stream.type == 'video': 
       for frame in packet.decode():
           # get PIL image
           image = frame.to_image()
           # to numpy
           image = np.array(image)
           # porocessing_img
           image[:, :, 1]  = 255
           # to frame
           new_frame = av.VideoFrame.from_ndarray(image, format='rgb24')
           # encode frame to packet
           for new_packet in out_stream_video.encode(new_frame):
               # mux packet
               output.mux(new_packet)

   elif packet.stream.type == 'audio':
       for audio_frame in packet.decode():
           # to numpy
           arr = audio_frame.to_ndarray()
           # decrease volume
           arr = arr * 0.1
           # to audio frame
           new_audio_frame = av.AudioFrame.from_ndarray(arr, format="fltp")
           new_audio_frame.rate = 44100
           # encode frame to packet
           for new_packet in out_stream_audio.encode(new_audio_frame):
               output.mux(new_packet)


input_.close()
output.close()

簡単に流れを説明します。上記のサンプルは、入力の動画を加工し別の動画ファイルを出力するというものです。

どのような加工をしたかというと以下の２つです。

元の動画のRGBのうちGの値を全ての時間で最大値(255)にする
音量を1/10にする（波形の振幅を1/10にする）

コードに出てくる概念イメージを説明するとつぎのようなイメージです。

input_やoutputはデータを詰めておく倉庫
streamは倉庫の荷物を運ぶ流通経路みたいなもの
packetが荷物
frameやaudio_frameは荷物の箱の中に入っている中身
倉庫に荷物が箱で詰まってるとき動画として再生できる

これを踏まえた上でプログラムの流れを簡単に説明すると

元の動画から、videoとaudioのstreamを一つのstreamとして取り出す
出力する動画を指定して、videoのstreamとaudioのstreamを作成する
入力のstreamからpakcetという動画の中身の一部をfor文で順番に取り出す4. packetには種類がありvideo時とaudio時とがあるり、処理を分ける。
videoだった場合は、packetからframeという画像のデータを取り出す。
画像データをnumpyに変換する。
RGBのGの値を255にする。
numpyからframeを作成する。
outputのvideoのstreamを使いoutput用のpacketを作成する。
outputに作成したpacketを詰め込む。
入力から取得したpacketが音だった場合、同様に音の処理をする。
入力動画と出力動画のファイル操作を終了する

というような感じになります。

他の候補

moviepyというPythonで動画編集をしちゃおうというものがあるそうです。こちらについては今後どこかで触ってみたいと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up