More than 1 year has passed since last update.

Python3 + PyAV 動画ファイルから映像と音声を読み込み、画像だけ処理を行い、音声はそのまま出力ファイルに書き出す

Posted at 2023-12-18

公式ドキュメントなどを読んでも同じようなサンプルコードがなかったのでトライアンドエラーを行い、実現できましたので共有致します。ご参考になれば幸いです。

○実現したかった仕様
動画ファイルを読み込んで、画像だけ処理を行い、音声はそのまま出力する。

OpenCVでは動画から画像は抽出できるが、音声はできなかった。（公式ドキュメントにはできそうな記載があるが、うまく実現できなかった。）
PyAVでは実現できました。

実現できたファイル形式：　MPEG4
Python: 3.11.2
PyAV: 0.9.4-7
LinuxMint Debian Edition: 6 "Faye"

movie-V+A.py

import av
import mymovie1 as mm
import threading

def process(image):
    global img
    import cv2
    #Do some thing here
    return

in_file = <Your input movie file path>
out_file = <Your output movie file path>

# Open input file and define video and audio stream
input_container = av.open(in_file)
input_vstream = input_container.streams.video[0] 
input_astream = input_container.streams.get(audio=0)[0]

# Open output file and define video and audio streams, copy exactly same parameters from input
output_container = av.open(out_file, 'w')

# Get the codec name from the input video stream.
codec_name = input_vstream.codec_context.name  
# Get the frame rate from the input video stream.
fps = input_vstream.average_rate  
output_vstream = output_container.add_stream(codec_name, str(fps))

# Set parameters for video output stream same as input stream
# If mux() is used, template = input_vstream will give you ERROR, thus, copy each parameters

# Set frame width to be the same as the width of the input stream
output_vstream.width = input_vstream.codec_context.width  
output_vstream.height = input_vstream.codec_context.height
# Copy pixel format from input stream to output stream
output_vstream.pix_fmt = input_vstream.codec_context.pix_fmt  
# Same for audio stream
codec_name = input_astream.codec_context.name
fps = input_astream.rate
output_astream = output_container.add_stream(codec_name, fps)

# Starting process looop till end of file
for packet in input_container.demux(input_astream, input_vstream):
    if packet.dts is None:
        continue
    # According to PyAV example, 
    # decoding done by "For loop", eventhough it is only one time.
    # decode the packet into frame
    for frame in packet.decode():   
        # According to type of frame (video or audio), 
        # change the process 
        # (this process change the global valiable 'img' 
        if type(frame) == av.video.frame.VideoFrame:
            #For process with OpenCV, convert to numpy array
            img = frame.to_ndarray(format="rgb24")  
            # for time-consuming image process make a thread 
            # and wait for completing the process.
            thread = threading.Thread(target= process(img))
            thread.start()
            thread.join()
            # Convert back to PyAV video frame type
            processed = av.VideoFrame.from_ndarray(img, format="rgb24")
            # Flush processed frame to output video stream
            for p in output_vstream.encode(processed):
                output_container.mux(p)

        if type(frame) == av.audio.frame.AudioFrame:
            # W/O any change flush audio frame to output audio stream
            for p in output_astream.encode(frame):
                output_container.mux(p)

   
#Must flush remaining stream for closing output container
for packet in output_vstream.encode():
    output_container.mux(packet)
for packet in output_astream.encode():
    output_container.mux(packet)

# if close input container, somehow I got erro and restart Python kernel. Hense, commented.
# input_container.close()
# Close output file
output_container.close()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up