More than 1 year has passed since last update.

AviUtlの.exoファイルからYouTube字幕(.SRT)を作る方法

Last updated at 2024-03-22Posted at 2024-03-22

イントロダクション

2年ほどYouTubeでの動画制作を行って分かったことがある。
YouTubeはYouTubeアルゴリズムに乗っかったものが勝つ。
この「勝つ」が具体的に何を意味するのか、それは視聴回数なのか、登録者数なのか、それともインプレッション数なのか、それは分からないが
とりあえずアルゴリズムが優遇した動画が同じジャンルの中でも席巻することになるのは間違いない。

その関連の1つとしてSEO的なものがある。
要するに検索した時に、どれだけ出てきやすくするかということだ。

YouTube字幕はアクセシビリティの観点だけではなくこの点でも便利だという。
YouTube字幕にした情報は「文字おこしを見る」からリスト的に見ることもできる。
投稿者がくっつけているタイムコードよりもさらに目的の場所に飛びやすくなるともいえるだろう。

いろいろ考えたうえでYouTube字幕を作っていなかったが
さらにいろいろ考えたうえで作ることにした。

概要

やりたいこと

AviUtlで作った字幕データをYouTubeにアップロードできる形式にしたい。

実装イメージ

AviUtlから出力した.exoファイルを字幕データに変換する。

字幕データってなんだ

なんかややこしいことにいろいろなファイル形式があるらしい。
ChatGPTに聞いて位置制御・カラー制御ができるという観点も追加しつつまとめてもらった

ファイル形式	位置制御	カラー制御
Subrip (SRT)	✖	✖
SubViewer	✔ (制限あり)	✔
MPsub	✖	✖
LRC	✖	✖
Videotron Lambda	✔	✔
SAMI (SMI)	✔	✔
RealText	✔	✔
WebVTT	✔	✔
TTML (Timed Text Markup Language)	✔	✔
DFXP (Distribution Format Exchange Profile)	✔	✔

サンプルコード

ChatGPTにサンプルコードを提供してもらった。
とはいえ、なんか勢いでSRTファイルで作ってしまった。
最初に目がついたからだ。

1
00:00:01,000 --> 00:00:03,000
これは最初の字幕です。

2
00:00:05,000 --> 00:00:07,000
これが次の字幕です。

プログラムソースコード①.py

# Create str file for add subtitle to my videos on YouTube
# EXOファイルをYouTubeにアップロード可能なSTRファイルに変換するためのpythonコード
# *********************************** ライブラリ ***********************************
import sys, os, time
import binascii
import re
import threading
import glob

def input_with_timeout(prompt, timeout):
    print(prompt, end='', flush=True)
    result = ['e']  # デフォルト値を設定
    def set_input():
        result[0] = input()
    input_thread = threading.Thread(target=set_input)
    input_thread.daemon = True
    input_thread.start()
    input_thread.join(timeout)
    return result[0]

def Get_data(exo_file_list, input_value):
    text_lines = []
    time_lines = []
    end_temp_list = []
    ex = exo_file_list
    target_layer = "layer={}".format(input_value)
    # exoファイルの中身を読み込みんで文字情報(テキスト)を取得
    with open(ex, 'r', encoding='utf-8', errors='ignore') as f:
        s_line = f.readlines()
        for l in range(len(s_line)):
            a = re.search("^text=", s_line[l])
            if a is None:
                continue
            # text属性の23行前にlayer=14があれば取得する
            if target_layer in s_line[l-23]:
                # テキスト変換
                text = s_line[l].replace("text=","").strip()
                b = binascii.unhexlify(text)
                s = str(b.decode("utf-16"))
                text_lines.append(s)
    # exoファイルの中身を読み込んでタイムの情報を取得
    with open(ex, 'r', errors='ignore') as f:
        s_line = f.readlines()
        # ==================================
        for l in range(len(s_line)):
            # startが行内にあれば
            if "start" in s_line[l]:
                try:
                    # startの25行後にtext属性の値があればstartを取得
                    # 136にstartがあれば161にtext属性がある
                    if "text" in s_line[l+25]:
                        # startの2行後にlayer14の値があれば取得
                        if target_layer in s_line[l+2]:
                            start_time = s_line[l].replace("start=","").strip() # "start"と空白を削除
                            time_lines.append(start_time)
                except:
                    print('error')
            # endが行内にあれば
            if "end=" in s_line[l]:
                if not "blend" in s_line[l]:
                    # print(s_line[l])
                    try:
                        if "text" in s_line[l+24]:
                            # startの2行後にlayer14の値があれば取得
                            if target_layer in s_line[l+1]:
                                end_time = s_line[l].replace("end=","").strip() # "start"と空白を削除
                                end_temp_list.append(end_time)
                    except:
                        print('error')
    # タイムとテキストのリストを再生成
    time_lines_new = []
    for t in range(len(text_lines)):
        append_text = text_lines[t].split('\x00')[0]
        print(append_text)
        if len(time_lines_new) == 0:
            time_lines_new.append([time_lines[t],append_text])
        else:
            if time_lines_new[len(time_lines_new)-1][1] != append_text:
                time_lines_new.append([time_lines[t],append_text])
    # print(time_lines_new)
    # 終了時間を追加で記録する
    time_lines_new.append([end_temp_list[len(end_temp_list)-1], "Closing"])
    return time_lines_new                        



# aviutlのタイム表記をmm:ss,msに変換する
def fix_time(result_list):
    result = []
    for rl in range(len(result_list)):
        time_value = result_list[rl][0]
        frames_per_second = 30
        milliseconds_per_frame = 1000 / frames_per_second
        time_value = int(time_value)
        # 秒数をmm:ss形式に変換する
        m = time_value // frames_per_second // 60
        s = time_value // frames_per_second % 60
        ms = (time_value % frames_per_second) * milliseconds_per_frame
        # 00埋めした状態で文字列を出力する
        fix_time = f"{m:02}:{s:02},{int(ms):03}"
        result.append([fix_time, result_list[rl][1]])
    return result

# 取得した二次配列をSRT形式に変換
def format_subtitles(subtitle_data):
    formatted_subtitles = []
    
    for sub_index in range(len(subtitle_data)):
        index = sub_index
        current_time = (subtitle_data[sub_index][0]).replace(',',':')
        text = subtitle_data[sub_index][1]
        
        current_minutes, current_seconds, current_mes = current_time.split(':')
        current_minutes = int(current_minutes)
        current_seconds = int(current_seconds)
        current_mes = int(current_mes)
        if index == len(subtitle_data)-1:
            # 最後字幕の秒数は適当に+1する
            next_minutes = int(next_minutes)
            next_seconds = int(next_seconds)+1
        else:
            next_time = (subtitle_data[sub_index+1][0]).replace(',',':')
            next_minutes, next_seconds, next_mes = next_time.split(':')
            next_minutes = int(next_minutes)
            next_seconds = int(next_seconds)
            next_mes = int(next_mes)

        start_time = f"00:{current_minutes:02d}:{current_seconds:02d},{current_mes:03d}"
        end_time = f"00:{next_minutes:02d}:{next_seconds:02d},{next_mes:03d}"
        
        # formatted_subtitles.append(f"{index + 1}\n{start_time} --> {end_time}\n{text}\n")
        formatted_subtitles.append([index + 1,f"{start_time} --> {end_time}",text])
    # 最後の要素を削除
    formatted_subtitles.pop()
    return formatted_subtitles

# 出力ファイルのリフレッシュ
def delete_files_with_same_name(file_path):
    # Extract the directory, file name without extension, and file extension
    directory, file_name = os.path.split(file_path)
    file_base, file_extension = os.path.splitext(file_name)

    # Search for files in the directory with the same name but different extensions
    for file in glob.glob(os.path.join(directory, file_base + '.*')):
        # Skip the file with the original extension
        if file != file_path:
            os.remove(file)
            print(f"Deleted: {file}")

# SRTファイルの作成
def create_srtfile(subtitles, file_name):
    delete_files_with_same_name(file_name)
    # .txt ファイルに書き込む
    txt_file_path = f"{file_name}.txt"
    with open(txt_file_path, 'w', encoding='utf-8') as file:
        for subtitle in subtitles:
            file.write('\n'.join(str(item) for item in subtitle) + '\n\n')
    
    # ファイル名を .srt に変更する
    file_name = file_name.replace('txt','')
    srt_file_path = f"{file_name}.srt"
    os.rename(txt_file_path, srt_file_path)

    return srt_file_path

def main():
    outputfile_path = 'SRT_File_for_Subtitle_on_YouTube.txt'
    # 字幕が書き込まれたレイヤーを入力させる
    print('\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n')
    input_value = input_with_timeout('字幕を入力したレイヤを指定してください(ex=2): ', 10)
    print('字幕レイヤを{}と認識しました'.format(input_value))
    if len(sys.argv) > 1:
        exo_file_list = sys.argv[1:2]
        ouputfile_path = os.path.join(os.path.dirname(exo_file_list), outputfile_path)
    else:
        print("========================= テストモードで実行します =========================")
        exo_file_list = "テストモードで実行するときに渡すパス"
        ouputfile_path = os.path.join(os.path.dirname(exo_file_list), outputfile_path)
    # exoファイルの中身を読み込む
    # シーン:タイトルから最終タイムとその文字のリストを作成する
    title_list = Get_data(exo_file_list, input_value)
    result_list = fix_time(title_list)
    srt_list = format_subtitles(result_list)
    # textファイルとして作成したのち、SRTファイルに変換する
    create_srtfile(srt_list, ouputfile_path)
    print('Process END...')
    time.sleep(5)

if __name__ == '__main__':
    main()

プログラムソースコード②.bat

python.exe "<プログラムソースコード②>" %*

スクリプトの使い方

AviUtlから出力した.exoファイルをプログラムソースコード②.batにドラッグアンドドロップする。
字幕を入力したレイヤーを指定する。

結果

ひと手間増えるが結構簡易的に字幕を追加することができた。

これから

ChatGPTが整理した情報によると作成難易度・カラー制御・位置制御のバランスがいいのはWebVTT形式であるらしい。
まあ…確かに今の形式だと単一色しか扱えないからTV字幕に及ばないしな…。
今回のコードを応用してWebVTTを扱えるようにしたい。
→WebVTTを試してみたが色の反映がうまくいかなかった。SAMIのほうはQiita場にも記事があったので、そちらで試してみることとする。
https://qiita.com/wy0727_betch/items/6e3d42e79e172ef637e3
→どうもYouTube字幕は奥が深いらしい。なんで公式が発表していない謎の形式が現れるんですかね…。
https://qiita.com/yudai1204/items/898c368af8832443a874

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up