More than 3 years have passed since last update.

[Python]XMLファイルから特定要素を抽出する方法メモ

Posted at 2021-08-18

特定のフォルダに配置されたXMLファイルの一覧から特定要素を抽出する方法についてメモする。

テストデータ

こちらのXMLをテストデータとして利用させていただく。

test1.xml

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

コード

test.py
- 本コードと同一階層に存在するxmlsフォルダのxmlファイルを解析し、その結果をresult/cdata.csvファイルに保存する。
- 上記テストデータの国名とランクを抽出する。

import xml.etree.ElementTree as ET
import glob
import csv

# XMLファイル一覧取得
# 前提:同一階層のxmlsフォルダにxmlファイルを配置する
xmls = glob.glob('xmls/*.xml', recursive=True)

# 解析結果格納用リスト
cdata_list = []

# ファイルごとに解析する
for xml in xmls:
    # XMLファイルパース
    tree = ET.parse(xml)
    root = tree.getroot()
    # 子要素(country)解析
    for country in root:
        # 子要素の中身を解析
        name = country.attrib["name"]
        rank = ""
        for child in country.iter():
            # 特定要素の抽出
            if child.tag == 'rank':
                # ファイル名、国名、ランクを出力
                rank = child.text
                print(
                    f'{xml},{name},{rank}')
                cdata = [xml, name, rank]
                cdata_list.append(cdata)

# CSVファイルに保存
with open('result/cdata.csv', 'w', newline="") as f:
    writer = csv.writer(f)
    writer.writerows(cdata_list)

動作確認

実行

python test.py
xmls\test1.xml,Liechtenstein,1
xmls\test1.xml,Singapore,4
xmls\test1.xml,Panama,68

xmls\test1.xml,Liechtenstein,1
xmls\test1.xml,Singapore,4
xmls\test1.xml,Panama,68

参考情報

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

[Python]XMLファイルから特定要素を抽出する方法 メモ

テストデータ

コード

動作確認

参考情報

[Python]XMLファイルから特定要素を抽出する方法メモ