公式ドキュメントなどを読んでも同じようなサンプルコードがなかったのでトライアンドエラーを行い、実現できましたので共有致します。ご参考になれば幸いです。
○実現したかった仕様
動画ファイルを読み込んで、画像だけ処理を行い、音声はそのまま出力する。
OpenCVでは動画から画像は抽出できるが、音声はできなかった。(公式ドキュメントにはできそうな記載があるが、うまく実現できなかった。)
PyAVでは実現できました。
実現できたファイル形式: MPEG4
Python: 3.11.2
PyAV: 0.9.4-7
LinuxMint Debian Edition: 6 "Faye"
movie-V+A.py
import av
import mymovie1 as mm
import threading
def process(image):
global img
import cv2
#Do some thing here
return
in_file = <Your input movie file path>
out_file = <Your output movie file path>
# Open input file and define video and audio stream
input_container = av.open(in_file)
input_vstream = input_container.streams.video[0]
input_astream = input_container.streams.get(audio=0)[0]
# Open output file and define video and audio streams, copy exactly same parameters from input
output_container = av.open(out_file, 'w')
# Get the codec name from the input video stream.
codec_name = input_vstream.codec_context.name
# Get the frame rate from the input video stream.
fps = input_vstream.average_rate
output_vstream = output_container.add_stream(codec_name, str(fps))
# Set parameters for video output stream same as input stream
# If mux() is used, template = input_vstream will give you ERROR, thus, copy each parameters
# Set frame width to be the same as the width of the input stream
output_vstream.width = input_vstream.codec_context.width
output_vstream.height = input_vstream.codec_context.height
# Copy pixel format from input stream to output stream
output_vstream.pix_fmt = input_vstream.codec_context.pix_fmt
# Same for audio stream
codec_name = input_astream.codec_context.name
fps = input_astream.rate
output_astream = output_container.add_stream(codec_name, fps)
# Starting process looop till end of file
for packet in input_container.demux(input_astream, input_vstream):
if packet.dts is None:
continue
# According to PyAV example,
# decoding done by "For loop", eventhough it is only one time.
# decode the packet into frame
for frame in packet.decode():
# According to type of frame (video or audio),
# change the process
# (this process change the global valiable 'img'
if type(frame) == av.video.frame.VideoFrame:
#For process with OpenCV, convert to numpy array
img = frame.to_ndarray(format="rgb24")
# for time-consuming image process make a thread
# and wait for completing the process.
thread = threading.Thread(target= process(img))
thread.start()
thread.join()
# Convert back to PyAV video frame type
processed = av.VideoFrame.from_ndarray(img, format="rgb24")
# Flush processed frame to output video stream
for p in output_vstream.encode(processed):
output_container.mux(p)
if type(frame) == av.audio.frame.AudioFrame:
# W/O any change flush audio frame to output audio stream
for p in output_astream.encode(frame):
output_container.mux(p)
#Must flush remaining stream for closing output container
for packet in output_vstream.encode():
output_container.mux(packet)
for packet in output_astream.encode():
output_container.mux(packet)
# if close input container, somehow I got erro and restart Python kernel. Hense, commented.
# input_container.close()
# Close output file
output_container.close()