More than 5 years have passed since last update.

【動画・音同期】動画の長さと音の長さを同期して音付動画を作成する

Posted at 2019-03-15

音声付き動画の作成について、調べた範囲では見つけられなかったことと、今回完成の域に達したのであげておく。
ロジックは極めて明確で、以下三点に気を付ければできるはずである。
前提；動画と同時取得の音を合成する
１．動画の長さを取得する⇒参考②を参照
２．音の長さを取得する⇒参考①を参照
３．動画のfpsを１、２の長さから調整して設定し、合成する⇒前日の記事を参照
なお、fpsはマシン依存するし状況にもよるのであくまで平均的な目安である。
【参考】
①pythonで音声処理
②Python 3 + OpenCV を用いて動画の長さを取得する

コードは以下に置いた

・hirakegoma/camera_input_movie2.py

具体的なコードの説明

今回は前日の記事のアプリの動画・音同期を行った。

# -*- coding: utf-8 -*-
import cv2
import pyaudio
import sys
import time
import wave
import pydub
from pydub import AudioSegment
import moviepy.editor as mp
import datetime

最初のfpsは適当に設定する。
※当初はfps=9としていた⇒以下で最適な設定が得られたらその値に変更する

cap = cv2.VideoCapture(0)
fps = 8.61328125 #9 #動画再生速度と音声とのmachingのために変更する

videoと音声取得のための設定

# 録画する動画のフレームサイズ（webカメラと同じにする）
size = (640, 480)
s=0
st=time.time()
# 出力する動画ファイルの設定
fourcc = cv2.VideoWriter_fourcc(*'XVID')
output_file='./dog-cat/output'
dt_now = datetime.datetime.now()
outputfile=output_file+str(st)+'.avi'
video = cv2.VideoWriter(outputfile, fourcc, fps, size)

CHUNK = 1024*5 #1024
FORMAT = pyaudio.paInt16
CHANNELS = 1  #monoral
# サンプリングレート、マイク性能に依存
RATE = 44100
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK,
            output = True)

具体的に一定時間取得する。今は以下のfor文から5秒間。

frames = []
while True:
    ret, frame = cap.read()
    input = stream.read(CHUNK)
    frame1 = cv2.resize(frame, dsize=(1280, 960))  #画面いっぱいに表示させている
    # 書き込み
    video.write(frame)
    # キー入力待機
    if cv2.waitKey(97) & 0xFF == ord('q'):
        break
    # 画面表示
    cv2.imshow('frame', frame1)    
    output = stream.write(input)
    frames.append(input)

5秒間取得後、以下のとおり音声を保存する

    if time.time()-st >= 5:
        # 終了処理
        cap.release()
        video.release()
        cv2.destroyAllWindows()
        
        s +=1
        wf = wave.open('out'+str(s)+'.wav', 'wb')
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(p.get_sample_size(FORMAT))  #width=2 ; 16bit
        wf.setframerate(RATE)
        wf.writeframes(b''.join(frames))
        wf.close()

音の長さを取得する。

        base_sound = AudioSegment.from_file('out'+str(s)+'.wav', format="wav")
        base_sound.export('out'+str(s)+'.mp3', format="mp3")  # 保存する
        length_seconds = base_sound.duration_seconds  # 長さを確認
        #print('sound_length_seconds={}'.format(length_seconds))

動画を取得し、動画の長さを取得し、上記の音の長さと一致させるように最適なFPSを算出する。
※この最適値を上記のfps = 8.61328125 の値に設定する
　なお、この値が今回の最適値として得られた値である

        cap = cv2.VideoCapture(outputfile)
        video_frame = cap.get(cv2.CAP_PROP_FRAME_COUNT) # フレーム数を取得する
        video_fps = cap.get(cv2.CAP_PROP_FPS)           # FPS を取得する
        video_len_sec = video_frame / video_fps         # 長さ（秒）を計算する
        videoFps= video_frame / length_seconds
        print('correct_videoFps={}'.format(videoFps))   # 最適なFPS逆算し、出力する

以下のとおり、動画と音を合成する。
なお、with open...のところで動画と音の長さをログ出力している。

        clip_output = mp.VideoFileClip(outputfile).subclip()
        clip_output.write_videofile(outputfile.replace('.avi', '.mp4'), audio='out'+str(s)+'.mp3')
        dt_now = datetime.datetime.now()
        with open('./dog-cat/file.txt', 'a') as f: #videoの長さと音の長さをログ出力する 
            f.write(str(dt_now)+'_'+str(s)+' ; '+outputfile+'sound_length_seconds={},video_len_sec={}'.format(length_seconds,video_len_sec)+'\n')
        outputfile=output_file+str(st)+'.avi'
        cap = cv2.VideoCapture(0)
        video = cv2.VideoWriter(outputfile, fourcc, fps, size)
        ret, frame = cap.read()
        frames=[]
        st=time.time()
        
# 終了処理
cap.release()
video.release()
cv2.destroyAllWindows()

結果

調整前後のログ出力は以下のとおり

2019-03-15 12:04:16.509149_1 ; ./dog-cat/output1552619045.3998518.avisound_length_seconds=8.010884353741497,video_len_sec=7.666666666666667
2019-03-15 12:04:28.888100_2 ; ./dog-cat/output1552619045.3998518.avisound_length_seconds=8.823582766439909,video_len_sec=8.444444444444445
2019-03-15 12:04:48.648639_3 ; ./dog-cat/output1552619057.644029.avisound_length_seconds=8.823582766439909,video_len_sec=8.444444444444445
2019-03-15 12:08:27.436513_1 ; ./dog-cat/output1552619243.118006.avisound_length_seconds=54.33469387755102,video_len_sec=52.0
2019-03-15 12:09:32.517772_2 ; ./dog-cat/output1552619243.118006.avisound_length_seconds=55.263492063492066,video_len_sec=52.888888888888886
2019-03-15 12:10:37.613890_3 ; ./dog-cat/output1552619308.5901027.avisound_length_seconds=55.263492063492066,video_len_sec=52.888888888888886
2019-03-15 12:11:42.688466_4 ; ./dog-cat/output1552619373.6652346.avisound_length_seconds=55.263492063492066,video_len_sec=52.888888888888886
2019-03-15 12:12:47.924787_5 ; ./dog-cat/output1552619438.7640424.avisound_length_seconds=55.3795918367347,video_len_sec=53.0
2019-03-15 12:21:32.910901_1 ; ./dog-cat/output1552620028.998037.avisound_length_seconds=54.45079365079365,video_len_sec=52.111111111111114

この時刻の前後で精度が大きくよくなっている。
また、以下の短い動画でも同じような精度で一致している。
※しかし、video_len_secの方が一様に大きく出ているのでもう少し精度上げらるようです

2019-03-15 12:24:43.977388_1 ; ./dog-cat/output1552620219.9809425.avisound_length_seconds=54.45079365079365,video_len_sec=54.53488372093023
2019-03-15 12:25:49.196268_2 ; ./dog-cat/output1552620219.9809425.avisound_length_seconds=55.3795918367347,video_len_sec=55.46511627906977
2019-03-15 14:14:28.595460_1 ; ./dog-cat/output1552626862.552084.avisound_length_seconds=3.3668934240362813,video_len_sec=3.372093023255814
2019-03-15 14:14:35.688208_2 ; ./dog-cat/output1552626862.552084.avisound_length_seconds=4.063492063492063,video_len_sec=4.069767441860465
2019-03-15 14:14:42.767024_3 ; ./dog-cat/output1552626869.7383416.avisound_length_seconds=4.179591836734694,video_len_sec=4.186046511627907
2019-03-15 14:14:49.794501_4 ; ./dog-cat/output1552626876.8432221.avisound_length_seconds=4.179591836734694,video_len_sec=4.186046511627907

まとめ

・動画と音を同時取得して、音付動画を作成できるようになった
・動画のfpsを調整して、動画と音を同期させることが出来た

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up