0
0

Python3 + PyAV 動画ファイルから映像と音声を読み込み、画像だけ処理を行い、音声はそのまま出力ファイルに書き出す

Posted at

公式ドキュメントなどを読んでも同じようなサンプルコードがなかったのでトライアンドエラーを行い、実現できましたので共有致します。ご参考になれば幸いです。

○実現したかった仕様
動画ファイルを読み込んで、画像だけ処理を行い、音声はそのまま出力する。

OpenCVでは動画から画像は抽出できるが、音声はできなかった。(公式ドキュメントにはできそうな記載があるが、うまく実現できなかった。)
PyAVでは実現できました。

実現できたファイル形式: MPEG4
Python: 3.11.2
PyAV: 0.9.4-7
LinuxMint Debian Edition: 6 "Faye"

movie-V+A.py
import av
import mymovie1 as mm
import threading

def process(image):
    global img
    import cv2
    #Do some thing here
    return

in_file = <Your input movie file path>
out_file = <Your output movie file path>

# Open input file and define video and audio stream
input_container = av.open(in_file)
input_vstream = input_container.streams.video[0] 
input_astream = input_container.streams.get(audio=0)[0]

# Open output file and define video and audio streams, copy exactly same parameters from input
output_container = av.open(out_file, 'w')

# Get the codec name from the input video stream.
codec_name = input_vstream.codec_context.name  
# Get the frame rate from the input video stream.
fps = input_vstream.average_rate  
output_vstream = output_container.add_stream(codec_name, str(fps))

# Set parameters for video output stream same as input stream
# If mux() is used, template = input_vstream will give you ERROR, thus, copy each parameters

# Set frame width to be the same as the width of the input stream
output_vstream.width = input_vstream.codec_context.width  
output_vstream.height = input_vstream.codec_context.height
# Copy pixel format from input stream to output stream
output_vstream.pix_fmt = input_vstream.codec_context.pix_fmt  
# Same for audio stream
codec_name = input_astream.codec_context.name
fps = input_astream.rate
output_astream = output_container.add_stream(codec_name, fps)

# Starting process looop till end of file
for packet in input_container.demux(input_astream, input_vstream):
    if packet.dts is None:
        continue
    # According to PyAV example, 
    # decoding done by "For loop", eventhough it is only one time.
    # decode the packet into frame
    for frame in packet.decode():   
        # According to type of frame (video or audio), 
        # change the process 
        # (this process change the global valiable 'img' 
        if type(frame) == av.video.frame.VideoFrame:
            #For process with OpenCV, convert to numpy array
            img = frame.to_ndarray(format="rgb24")  
            # for time-consuming image process make a thread 
            # and wait for completing the process.
            thread = threading.Thread(target= process(img))
            thread.start()
            thread.join()
            # Convert back to PyAV video frame type
            processed = av.VideoFrame.from_ndarray(img, format="rgb24")
            # Flush processed frame to output video stream
            for p in output_vstream.encode(processed):
                output_container.mux(p)

        if type(frame) == av.audio.frame.AudioFrame:
            # W/O any change flush audio frame to output audio stream
            for p in output_astream.encode(frame):
                output_container.mux(p)

   
#Must flush remaining stream for closing output container
for packet in output_vstream.encode():
    output_container.mux(packet)
for packet in output_astream.encode():
    output_container.mux(packet)

# if close input container, somehow I got erro and restart Python kernel. Hense, commented.
# input_container.close()
# Close output file
output_container.close()
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0