More than 1 year has passed since last update.

Pydanticのデータ定義をJSONで書く色々な方法

Posted at 2023-05-22

はじめに

この記事では、JSON形式でスキーマを定義して、PyDanticのクラスを作成する方法を2つ紹介します。

型名と引数を書いたJSONをPyDanticのクラスに変換する
JSONSchema形式で書いたJSONをPyDanticのクラスに変換する

どういうメリットと、どういうメリットがあるの？

JSONで定義するメリットとしては、定義をアプリの外部に置くことができるので、

利用者の数が変わる昼と夜で閾値を変えたい
クラウド側に定義ファイルを置きたい

といった要望があった時に対応できることが考えられます。

デメリットとしては、IDEから定義が全く見えないせいで自動補完が効かず、ツールとしての恩恵を受けにくいことです。

型名と引数を書いたJSONをPyDanticのクラスに変換する

この方法では、スキーマには、

型の名前(キー名: type, str型)
Fieldクラスに渡す引数(キー名: attribute, dict型)

を定義します。

バリデータとして、以下のようなJSONファイルを作成しました。

schema.json

{
    "user_name": {
        "type": "str",
        "attribute": {"min_length": 1, "max_length": 8},
    },
    "role_number": {
        "type": "int",
        "attribute": {"ge": 1, "le": 9},
    },
    "group_number": {
        "type": "int|None",
        "attribute": {"ge": 1, "le": 4},
    },
}

読み込む関数は以下のように定義します。

import json
from pydantic import BaseModel, create_model, Field


class DataType(BaseModel):
    """ スキーマを解釈するクラス """
    type: str
    attribute: dict


def json_to_pydantic(schema_data: str):
    """ JsonをPyDanticの型に変換します """
    fields = {}
    # Json文字列を読み込んでオブジェクトに変換する
    schema: dict = json.loads(schema_data)
    # キーを配列で取得する
    for key in schema.keys():
        define = DataType.parse_obj(schema[key])
        fields[key] = (
            define.type, # strやintの文字列を渡すと型名として解釈される
            Field(**define.attribute), # Fieldの設定値を渡す
        )
    # PyDanticのモデルを作成する
    return create_model("Model", **fields)

作成した関数を使ってみます。

# JSONファイルを読み込んで、Pydanticのデータ型に変換する
with open("schema.json") as fp:
    cls = json_to_pydantic(fp.read())

# parse_objで値をバリデートする: 正常なケース
result = cls.parse_obj({"user_name": "User", "role_number": 8, "group_number": 2})
print(result) # >> 出力: user_name='User' role_number=8 group_number=2

# parse_objで値をバリデートする: 正常なケース
# group_numberの許容型はint|Noneなので、Nullは許容される
result = cls.parse_obj({"user_name": "User", "role_number": 8})
print(result) # >> 出力: user_name='User' role_number=8 group_number=None

# parse_objで値をバリデートする: エラーになるケース
# role_numberの許容型はintなので、Nullは許容されない
result = cls.parse_obj({"user_name": "User", "group_number": 2})
print(result) # >> 出力: field required (type=value_error.missing)

# parse_objで値をバリデートする: エラーになるケース
# group_numberの許容型はint|Noneなので、文字列型は許容されない
result = cls.parse_obj({"user_name": "User", "role_number": 8, "group_number": "two"})
print(result) # >> 出力: value is not a valid integer (type=type_error.integer)

# parse_objで値をバリデートする: エラーになるケース
# group_numberの範囲は1～4、9は上限を超えている
result = cls.parse_obj({"user_name": "User", "role_number": 8, "group_number": 9})
print(result) # >> 出力: ensure this value is less than or equal to 4 (type=value_error.number.not_le; limit_value=4)

JSONSchema形式で書いたJSONをPyDanticのクラスに変換する

もう一つは、JSONSchema形式のJSONを読み込む方法です。

datamodel_code_generatorが必要になるので、pipでインストールしておきます。

pip install datamodel-code-generator

以下のようにスキーマを定義します。

person.json

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    },
    "friends": {
      "type": "array"
    },
    "comment": {
      "type": "null"
    }
  }
}

datamodel_code_generatorでJSONSchemaを実行する処理は別ファイルに切り出します

subproc.py

from datamodel_code_generator import generate
from pathlib import Path
from sys import argv

print(generate(Path(argv[1]), input_file_type="jsonschema"))

処理は以下のように定義します

from subprocess import run, PIPE


def read_from_json(json_name: str):
    """ JSONSchemaをPyDanticのクラスに変換する処理を実行する """
    cmd = f"python subproc.py {json_name}"
    # 結果は標準出力で受け取ります
    return run(cmd, capture_output=True, text=True).stdout


def global_to_class(class_name: str):
    """ Global変数から対象のクラスを受け取ります """
    if class_name in globals():
        return globals()[class_name]
    return None

実行は以下のように行います


exec(read_from_json("person.json"))
cls = global_to_class("Person")
if cls is not None:
    print(cls.parse_obj({"firstName": "Jane", "lastName": "Doe"}))
    # 実行結果: firstName='Jane' lastName='Doe' age=None friends=None

JSONSchemaをクラスに変換する処理をサブプロセス上でやる、結果を標準出力で受けてexec関数に渡す、実行結果がglobalsの変数に格納されるのでそこから対象のクラスを読み出す、といった形になります。

※execはルートで実行します
　関数内で実行するとglobalsではなくlocalsに格納されますが、「pydantic.errors.ConfigError: field "firstName" not yet prepared so type is still a ForwardRef, you might need to call Person.update_forward_refs().」のエラーが出て実行できません

まとめ：実現できないわけではないけれども

おとなしくCerberus使ったほうが楽です…

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up