More than 1 year has passed since last update.

XMLスキーマファイルから要素と属性の一覧を作成する

Posted at 2023-10-14

XMLのスキーマファイル（*.xsd）から要素（element）と属性（attribute）を一覧化する方法を紹介します。誰かやってるだろうと思ったのですが、検索してもそれらしいツールも出てこなかったので自作スクリプトです。もし標準的なツールがあるなら教えてください。

こんな人におすすめ

現状のソースコードから仕様書を作成してくださいとか言われた人
標準の規格を使ってるけど公式サイトに一覧がまとまってなくて困ってる人
スキーマファイルがあるんだから仕様書作れます！と言っちゃった人（私です）

XMLのスキーマファイルを読み込む方法

自分が調べた限りでは下記の方法がありました。

Excelでのエクスポート、インポートでスキーマファイルを利用する方法
Pythonのライブラリを利用する方法

Excelで読み込む方法はXMLのデータをExcel上でうまく扱うための方法っぽいので、スキーマファイル自体を操作することには向いていないようでした。今回はPythonのライブラリを利用してとりあえず一覧を出力することにします。

利用するPythonライブラリ

xmlschemaを利用します。とりあえずインストールして読み込むためのコードは公式のドキュメントに記載されている通り。

pip install xmlschema

>>> import xmlschema
>>> my_schema = xmlschema.XMLSchema('tests/test_cases/examples/vehicles/vehicles.xsd')

本来はスキーマファイルの情報を使ってXMLの内容をJSONや辞書形式に変換するような使用方法が想定されているものらしいです。

>>> import xmlschema
>>> from pprint import pprint
>>> xs = xmlschema.XMLSchema('tests/test_cases/examples/collection/collection.xsd')
>>> pprint(xs.to_dict('tests/test_cases/examples/collection/collection.xml'))
{'@xsi:schemaLocation': 'http://example.com/ns/collection collection.xsd',
 'object': [{'@available': True,
             '@id': 'b0836217462',
             'author': {'@id': 'PAR',
                        'born': '1841-02-25',
                        'dead': '1919-12-03',
                        'name': 'Pierre-Auguste Renoir',
                        'qualification': 'painter'},
             'estimation': Decimal('10000.00'),
             'position': 1,
             'title': 'The Umbrellas',
             'year': '1886'},
            {'@available': True,
             '@id': 'b0836217463',
             'author': {'@id': 'JM',
                        'born': '1893-04-20',
                        'dead': '1983-12-25',
                        'name': 'Joan Miró',
                        'qualification': 'painter, sculptor and ceramicist'},
             'position': 2,
             'title': None,
             'year': '1925'}]}

要素の一覧を取得する

xmlschemaを使って要素の一覧を取得します。ElementTreeなどを使ってXMLを操作するときのように、再帰的に子要素を探索していきます。

xml_schema_export.py

import xmlschema

def main():
    root_elements = xmlschema.XMLSchema('../sample/jmx.xsd').root_elements
    for ele in root_elements:
        print_children_element(ele, 0)


def print_children_element(element:xmlschema.XsdElement, depth):
    indent = ''
    for _ in range(depth):
        indent = indent + '    '
    print(indent, element.local_name or 'any', sep='')

    for ele in element.iterchildren():
        print_children_element(ele, depth + 1)

if __name__ == "__main__":
    main()

表示のためにごちゃごちゃ書いてますが、ポイントとしては下記です。

XsdElement.iterchildren()で子要素を取得
XsdElement.local_nameで要素の名前を取得（nameだとnamespaceの情報がくっついてしまって読みにくいです。）

サンプルとして気象庁が出している防災情報XMLフォーマットからスキーマファイルを読み込んでみましょう。
※サンプルはsampleフォルダに格納してある前提

$ python3 xml_schema_export.py
Report
    Control
        Title
        DateTime
        Status
        EditorialOffice
        PublishingOffice
    Head
        Title
        ReportDateTime
        TargetDateTime
        TargetDTDubious
        TargetDuration
        ValidDateTime
        EventID
        InfoType
        Serial
        InfoKind
        InfoKindVersion
        Headline
            Text
            Information
                Item
                    Kind
                        Name
                        Code
                        Condition
                    LastKind
                        Name
                        Code
                        Condition
                    Areas
                        Area
                            Name
                            Code
                            Circle
                                BasePoint
                                Axes
                                    Axis
                                        Direction
                                        Bearings
                                        Radius
                                        any
                                    LongAxis
                                        Direction
                                        Bearings
                                        Radius
                                        any
                                    ShortAxis
                                        Direction
                                        Bearings
                                        Radius
                                        any
                                any
                            Coordinate
                            Line
                            Polygon
                    any
    any

要素の属性を取得する

要素の一覧が取得できたので、それぞれに設定できる属性を取得します。

属性のリストはelement.attributesで取得することができます。print_children_element()の表示のところだけ変更します。Excelで処理しやすいように、タブで区切って隣の列に表示できるようにしてます。

xml_schema_export.py

def print_children_element(element:xmlschema.XsdElement, depth):
    indent = ''
    for _ in range(depth):
        indent = indent + '    '
    
    print(indent, element.local_name or 'any', sep='', end=None if len(element.attributes) == 0 else '')
    for attrib in element.attributes:
        print('\t', attrib, sep='')

    for ele in element.iterchildren():
        print_children_element(ele, depth + 1)

出力はこんな感じになります。

属性の列挙値を取得する

スキーマによっては属性に設定可能な値が列挙型として指定されていることもあります。その場合には、element.attributes[key].type.enumerationで列挙値を取得することができます。

今回は属性の横の列に配列形式でそのまま出力するようにしておきました。ソース全文を載せておきます。

xml_schema_export.py

import xmlschema

def main():
    root_elements = xmlschema.XMLSchema('../sample/jmx.xsd').root_elements
    for ele in root_elements:
        print_children_element(ele, 0)


def print_children_element(element:xmlschema.XsdElement, depth):
    indent = ''
    for _ in range(depth):
        indent = indent + '    '
    
    print(indent, element.local_name or 'any', sep='', end=None if len(element.attributes) == 0 else '')
    for attrib in element.attributes.keys():
        print('\t', attrib, '\t', element.attributes[attrib].type.enumeration or '', sep='')

    for ele in element.iterchildren():
        print_children_element(ele, depth + 1)

if __name__ == "__main__":
    main()

まとめ

XMLスキーマファイルを読み込んで要素と属性の一覧を取得する方法を紹介しました。

ライブラリはxmlschemaを利用する。
要素の一覧はelement.iterchildren()を使って再帰的に取得
属性はelement.attributesで取得
属性の列挙値はelement.attributes[key].type.enumerationで取得

今回作成したソースコードは下記のリポジトリに公開しています。

参考資料

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up