pythonでxmlファイルから特定のタグを抽出する

Posted at 2024-11-01

やったこと

5万行くらいのxmlから特定のタグの値を抽出したかったので、スクリプトを書きました。

import xml.etree.ElementTree as ET

# 処理するXMLファイルのパス
file_path = 'Library.xml'
# 出力先のテキストファイルのパス
output_file = 'output.txt'

# タグ名と名前空間URIを定義
namespace_uri = 'http://www.hogehoge.com'
get_tag_first = 'xxx'  
get_tag_second = 'yyy'  
get_tag_three = 'zzz'  

# 出力ファイルを開く
with open(output_file, 'w', encoding='utf-8') as f_out:
    # イテレータパーサを使用してXMLを逐次的にパース
    context = ET.iterparse(file_path, events=('end',))
    for event, elem in context:
        # タグ名を直接指定して比較
        if elem.tag == f'{{{namespace_uri}}}abctag':
            factorone = elem.get(get_tag_first,"").strip()
            factortwo = elem.get(get_tag_second,"").strip()
            factorthree = elem.get(get_tag_three,"").strip()
            f_out.write(f"{factorone}     {factortwo}     {factorthree}\n")
            elem.clear()

xmlファイルのサンプルがあれば良かったのですが、
ひとまず記録のためこんな感じで作成しました。

xxx、yyy、zzz　が取得したかった値が入っているタグ名です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up