More than 5 years have passed since last update.

Windows Movie Maker で作った xlmp ファイルから字幕だけを取り出す。

Posted at 2014-11-12

問題

Windows Movie Maker で作った字幕を順番に取り出して、
台本みたいなテキストファイルを作りたい。
Windows Movie Maker で生成される wlmp ファイルは XML ファイルなので読めるかとおもいきや、ファイル中のテキストは時間順に並んでいない。

解決策

字幕ごとに extentID が順不同で振られていて、
最後に extentID が時間順にならんでいる。
そこで、字幕ごとにさきにオブジェクトを作っておいて、
最後の extentID のリストの順番で出力する。

コード

ライセンスは MIT。

"""
Script Extractor from Windows Movie Maker XML file
This script takes wlmp (Windows Movie Maker file)
as an input, and extracts the text (TitleClip tag).

Author: mzmttks
License: MIT License
"""

import sys
import lxml.etree

try:
    ifile = sys.argv[1]
except:
    err = """ERROR: INPUT_FILE is not given
usage: wlmp2script.py INPUT_FILE
"""
    sys.stderr.writelines(err)
    sys.exit(1)

# open wlmp
with open(ifile) as handle:
    obj = lxml.etree.fromstring(" ".join(handle))

# extract subtitles
textsets = {}
for titleclip in obj.xpath("//TitleClip"):
    strs = u""
    for strset in titleclip.iterdescendants("BoundPropertyStringSet"):
        if strset.attrib["Name"]  != "string":
            continue
        strs = [s.attrib["Value"] for s
                in strset.iterchildren("BoundPropertyStringElement")
                if len(s.attrib["Value"]) > 0]
        strs = map(unicode, strs)
        strs = "\n".join(strs)
    textsets[titleclip.attrib["extentID"]] = unicode.encode(strs, "utf-8")
    
titles = [e.attrib["id"] for e in obj.xpath("//ExtentRef")]

# output subtitles
for title in titles:
    if title in textsets.keys() and len(textsets[title])>0:
        print textsets[title]
        print

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up