More than 1 year has passed since last update.

PEG.js(peggy)でzod schemaを解析してみよう

Last updated at 2024-01-01Posted at 2024-01-01

PEGとはパーサージェネレーターです。パーサージェネレーターとはなんやねんというと、文を解析するプログラムを作成するプログラムです。ややこしいですね。
PEGを使用することで、たとえばcsvとかJSONとか一定の構文を持った文字列を解析し、JavaScriptのObjectにすることができます。

PEGのnpmライブラリは更新が途絶しているようなので、peggyという後継のライブラリを使用します。

今回、行いたいことは……。

export const todoListSchema = z.object({
  id: z.number(),
  task: z.string(),
  description: z.string().nullish(),
  due_date: z.date().nullish(),
  created_at: z.date().nullish(),
  updated_at: z.date().nullish(),
});

このようなzod schemaを以下のようなJavaScriptのオブジェクトにパースします。

[
   {
      "tableName": "todoListSchema",
      "properties": [
         {
            "name": "id",
            "schema": "z.number()"
         },
         {
            "name": "task",
            "schema": "z.string()"
         },
         {
            "name": "description",
            "schema": "z.string().nullish()"
         },
         {
            "name": "due_date",
            "schema": "z.date().nullish()"
         },
         {
            "name": "created_at",
            "schema": "z.date().nullish()"
         },
         {
            "name": "updated_at",
            "schema": "z.date().nullish()"
         }
      ]
   }
]

なんでそんなことをしたいのかはさておき、ひとまずPEGの練習としてちょうどよいのでやっていきましょう。

PEG.jsはオンラインでいろいろ書いて試すことができます。

ここでやっていきましょう。

なにはともあれstart

PEGは、ひとまずstartを定義することからはじまります。中身はあとで考えましょう。

start =

文字の解析

文字を解析してみましょう。
1文字だけ認識して、そのまま1文字返します。

start = character
character = c:. {return c;}

こうすることで、characterとして「あ」が入力されたら”あ”が返ってきます。
まだ1文字しか解析できないので、「ああ」や「あ」＋改行などが入力されるとエラーになります。
=の左辺にあるcharacterというのがルール名です。
これは好きにつけて良いので、charとかにしても良いです。
次にc:という部分は変数宣言みたいなものです。cに「.」を束縛しています。
「.」は任意の一文字を表します。
ということで、「.」はなんらかの1文字と合致するので、「あ」と合致するわけですね。
{return c}で、ここでさっきの束縛したcがリターンされています。
これはべつにcではなくaでもbでもなんでも良いのですが、characterのcを今回は使いました。
長々と書きましたが、なんらかの1文字が見つかったら、それをそのまま返す、というだけのことです。

文字列の解析

次は「Hello,World」と入力されたら、そのまま「Hello, World」と返ってくるようなものをつくってみましょう。

start = text
text = text:$(character*) {return text;}
character = c:. {return c;}

characterを使用したtextというのを定義しています。
startがcharacterからtextに変更されていることに注意してください。
小さい定義から少し大きな定義をつくり、それをstartに渡すようにつくっていくのが良いかと思います。

text=の部分はルール宣言、text:は変数宣言ですね。
*は0回以上の繰り返しです。
$()は()のなかにマッチした文字列全体を取得します。
ということで、characterで1文字取得を*で何回も繰り返して、文字をどんどんパースしていくわけです。
これで「Hello, World」と入力すると「'Hello, World'」と解析できるようになりました。

key, valueの解析

zodSchemaでもObjectでも良いのですが、key:valueのペアを解析してみましょう。
入力は

id: 1

としてみましょう。これを

{
  key: "id",
  value: "1"
}

とパースするのが目標です。

start = propertyDefinition
propertyDefinition =  key:text ":"  value:text  {return {key, value}}
text = text:$(character*) {return text;}
character = !(":") c:. {return c;}

:は特別な構文になるので、characterのときに無視するようにします。
!(”:”)とすることで除外できます。
次に、propertyDefinitionを考えます。
key:textの部分は何度も出てきましたが変数宣言ですね。
key + “:” + valueに分解することができました。
startは新しく作成したpropertyDefinitionに変更します。
こうすると、解析結果は以下の通りになります。

{
  key: 'id ',
  value: ' 1'
}

悪くはないのですが、idのあとの空白、1の前の空白があるので、これを取り除いてみましょう。

空白を定義します。

indentation = " "*

“ ”* は半角スペース(空白)を*なので0回以上繰り返すことを意味します。
半角スペースが2個でも4個でも対応できるわけですね。

characterに空白が含まれないようにします。

character = !(":" / " ") c:. {return c;}

!()は除外する構文なのですが、「/」とすることでor条件になります。
よって「:」あるいは「」のときはcharacterに含みませんよ、ということになります。

あとはpropertyDefinitionを調整する必要があります。

propertyDefinition = indentation? key:text indentation?  ":" indentation? value:text  {return {key, value}}

空白文字は、あるかもしれないし、ないかもしれません。
そういうときは?とつけることで、あったりなかったりすることを表現できます。
空白かも＋key(文字列)＋空白かも＋「：」＋空白かも＋value(文字列)ということですね。
そして最終的にはkey: valueで返しています。
結果は以下の通りです。

{
  key: 'id',
  value: '1'
}

indentについて考える

さて、indentは空白派の人もいますが、tab派の人もいます。
どちらにも対応できるようにindentを改善してみましょう。

indentation = [ \t]*

ここで[]というものが出てきました。これは文字クラスと呼びます。
文字クラスとは、[]のなかにあるいずれか1文字にマッチする、というものです。
たとえば、[abc] は a、b、c のいずれか一文字にマッチします。また、[a-z] のようにハイフンを使用すると、範囲を指定することができます。この場合、a から z までの任意の一文字にマッチします。
\tはタブを意味しています。その前に半角スペースがあります。
よって、半角スペースあるいはタブに合致するもの、それの0回以上の繰り返し、ということになります。
これで無駄にタブがたくさんあっても問題なくパースできるようになりました。

複数行の解析に対応する

いまは1行だけしか解析できませんが、複数行に対応できるようにしてみましょう。

入力

id: 1,
name: "TARO"

末尾のカンマが邪魔なので取り除きましょう。
commaを定義します。

comma = ","

propertiDefinitionの最後にcommaがきても良いことにします。

propertyDefinition = indentation? key:text indentation?  ":" indentation? comma? value:text  {return {key, value}}

読み飛ばしを定義したあとは、必ずcharacterの除外に入れるのを忘れないようにしましょう。

character = !(":" / " " / comma) c:. {return c;}

複数行に対応するには、定義を*で0回以上繰り返すだけです。

start = lines
lines = properties:propertyDefinition*

これでpropertyDefinition列を複数回繰り返すことができました。

全体

start = lines
lines = properties:propertyDefinition*
propertyDefinition = indentation? key:text indentation?  ":" indentation? value:text comma?  {return {key, value}}
text = text:$(character*) {return text;}
comma = ","
character = !(":" / " " / comma) c:. {return c;}
indentation = [ \t]*

入力

id: 1,
age: 2

結果

[
  {
    key: 'id',
    value: '1'
  },
  {
    key: '\nage',
    value: '2'
  }
]

改行とクォーテーションに対応する

2個目のkeyに\nという改行コードが入ってしまっています。
lineBreakに対応しましょう。

lineBreak = "\n" / "\r\n" / "\r"
character = !(":" / " " / comma / lineBreak) c:. {return c;}

また読み飛ばしを定義したのでcharacterに追加しています。

propertyDefinition = indentation? key:text indentation?  ":" indentation? value:text comma? lineBreak?  {return {key, value}}

カンマの次に改行が来る可能性があるので、lineBreak?としておきます。
これで結果は以下の通りとなります。

[
  {
    key: 'id',
    value: '1'
  },
  {
    key: 'age',
    value: '2'
  }
]

次に、stringの入力があった場合を考えます。
以下のような入力だったとしましょう。

id: 1,
"name": "Taro",
"job": 'none'

keyやvalueがシングルクォーテーションやダブルクォーテーションでくくられているかもしれません。これに対応してみます。

quote = '"' / "'"
character = !(":" / " " / comma / lineBreak / quote) c:. {return c;}

ダブルクォーテーションでシングルクォーテーションを囲み、
シングルクォーテーションでダブルクォーテーションを囲んでいます。
また、読み飛ばしが追加されたのでcharacterを更新しています。

propertyDefinition = indentation? quote? key:text quote? indentation?  ":" indentation? quote? value:text quote? comma? lineBreak?  {return {key, value}}

quoteが来る可能性があるkeyとvalueの前をquote?で挟み込んでいます。

全体

start = lines
lines = properties:propertyDefinition*
propertyDefinition = indentation? quote? key:text quote? indentation?  ":" indentation? quote? value:text quote? comma? lineBreak?  {return {key, value}}
text = text:$(character*) {return text;}
comma = ","
lineBreak = "\n" / "\r\n" / "\r"
quote = '"' / "'"
character = !(":" / " " / comma / lineBreak / quote) c:. {return c;}
indentation = [ \t]*

入力

id: 1,
"name": "Taro",
"job": 'none'

出力

[
  {
    key: 'id',
    value: '1'
  },
  {
    key: 'name',
    value: 'Taro'
  },
  {
    key: 'job',
    value: 'none'
  }
]

いい感じですね。

export const ……schema = z.object({を処理する。

key:valueペアは処理できるようになったので、開始の文を処理できるようにしてみましょう。

入力

export const aaaSchema = z.object({

出力

aaaSchema

これだけを目指してみましょう。
export ,const, equal, z.object({を読み飛ばします。

constKeyword = "const"
exportKeyword = "export"
zodSchemaStart = "z.object({"
zodSchemaHeader = exportKeyword indentation constKeyword indentation name:text indentation "=" indentation zodSchemaStart indentation? lineBreak? {return name}

const ,export, z.object({を定義した部分は大丈夫でしょう。
あとはexport, space, const, space, name, space, “=”, space, zodSchemastart, space, lineBreakというように解析していきます。
export const schemaName = z.object({ を解析するように、そのまま素直に書いているだけですね。
そしてreturn nameなのでschema名が返ってくるわけです。

zodSchemaHeaderとpropertyDefinitionを繋げてみましょう。

schemaDefinition = header:zodSchemaHeader lineBreak? properties:propertyDefinition* {return {header, properties}}

こうすることで、さきほどのheaderを解析したあと、改行後、key:valueのペアの繰り返しを解析していきます。新しい構文は特にないですね。

peggy

start = schemaDefinition
schemaDefinition = header:zodSchemaHeader lineBreak? properties:propertyDefinition* {return {header, properties}}

propertyDefinition = indentation? quote? key:text quote? indentation?  ":" indentation? quote? value:text quote? comma? lineBreak?  {return {key, value}}
text = text:$(character*) {return text;}
comma = ","
lineBreak = "\n" / "\r\n" / "\r"
quote = '"' / "'"
character = !(":" / " " / comma / lineBreak / quote) c:. {return c;}
indentation = [ \t]*

constKeyword = "const"
exportKeyword = "export"
zodSchemaStart = "z.object({"
zodSchemaHeader = exportKeyword indentation constKeyword indentation name:text indentation "=" indentation zodSchemaStart indentation? lineBreak? {return name}

入力

export const aaaSchema = z.object({
id: 1,
name: "TAKEDA"

出力

{
  header: 'aaaSchema',
  properties: [
    {
      key: 'id',
      value: '1'
    },
    {
      key: 'name',
      value: 'TAKEDA'
    }
  ]
}

そろそろいい感じになってきたので、headerとかkey:valueの部分をちょこっと変更します。

peggy

schemaDefinition = tableName:zodSchemaHeader lineBreak? properties:propertyDefinition* {return {tableName, properties}}

propertyDefinition = indentation? quote? name:text quote? indentation?  ":" indentation? quote? schema:text quote? comma? lineBreak?  {return {name, schema}}

こんな感じでreturnのときの名前を変更しただけです。
header → tableName
key → name, value->schema

終了をパースする

z.object({の終わり部分をパースしましょう。

endOfLine = "});"

propertyDefinition = indentation? quote? name:text quote? indentation?  ":" indentation? quote? schema:text quote? comma? lineBreak? endOfLine?  {return {name, schema}}

最後にendOfLine?を入れておいてあげましょう。

peggy

start = schemaDefinition
schemaDefinition = tableName:zodSchemaHeader lineBreak? properties:propertyDefinition* {return {tableName, properties}}

propertyDefinition = indentation? quote? name:text quote? indentation?  ":" indentation? quote? schema:text quote? comma? lineBreak? endOfLine?  {return {name, schema}}
text = text:$(character*) {return text;}
comma = ","
lineBreak = "\n" / "\r\n" / "\r"
quote = '"' / "'"
character = !(":" / " " / comma / lineBreak / quote) c:. {return c;}
indentation = [ \t]*

endOfLine = "});"
constKeyword = "const"
exportKeyword = "export"
zodSchemaStart = "z.object({"
zodSchemaHeader = exportKeyword indentation constKeyword indentation name:text indentation "=" indentation zodSchemaStart indentation? lineBreak? {return name}

入力

export const aaaSchema = z.object({
  id: 1,
  name: "YAMADA"
});

出力

{
  tableName: 'aaaSchema',
  properties: [
    {
      name: 'id',
      schema: '1'
    },
    {
      name: 'name',
      schema: 'YAMADA'
    }
  ]
}

複数のschemaに対応する

入力

export const aaaSchema = z.object({
  id: 1,
  name: "YAMADA"
});

export const bbbSchema = z.object({
  id: 2,
  name: "YAMADAMAMA"
});

こんな感じの複数のスキーマに対応できるようにします。

空行、schemaではないlineを処理できるようにしてみます。

nonSchemaLine = (!schemaDefinition.)+ lineBreak? {return null }

これでschemDefinitionではない行になります。
. は任意の一文字を表します。しかし、この . は !schemaDefinition の直後にあるため、schemaDefinition ルールにマッチしない任意の一文字を表します。
ということで、schemaDefinitionに対応しない任意の文字の少なくとも1個の繰り返し、みたいな感じです。
結局のところschemaDefinition以外の列については読み飛ばしてnullにしているということです。
これでconst add = (num1:number, num2:number):number => num1 + num2;みたいな関数も読み飛ばすことができるようになったわけです。

start = line:(schemaDefinition / nonSchemaLine)* {return line.filter(x => x !== null)}

最後にstartをschemaDefinitionあるいはnonSchemaLineとして、それを0回以上の繰り返しとします。lineという変数に束縛し、あとはlineでnullのものについてはfilterするという通常のJavaScriptで処理して終わりです。

ということで完成形のpeggy

start = line:(schemaDefinition / nonSchemaLine)* {return line.filter(x => x !== null)}
nonSchemaLine = (!schemaDefinition.)+ lineBreak? {return null }
schemaDefinition = tableName:zodSchemaHeader lineBreak? properties:propertyDefinition* {return {tableName, properties}}
propertyDefinition = indentation? quote? name:text quote? indentation?  ":" indentation? quote? schema:text quote? comma? lineBreak? endOfLine?  {return {name, schema}}
text = text:$(character*) {return text;}
comma = ","
lineBreak = "\n" / "\r\n" / "\r"
quote = '"' / "'"
character = !(":" / " " / comma / lineBreak / quote) c:. {return c;}
indentation = [ \t]*

endOfLine = "});"
constKeyword = "const"
exportKeyword = "export"
zodSchemaStart = "z.object({"
zodSchemaHeader = exportKeyword indentation constKeyword indentation name:text indentation "=" indentation zodSchemaStart indentation? lineBreak? {return name}

ひとつずつ処理を書き進めてみました。あとは、ここで学んだことの組み合わせでなんとかなるでしょう。なんとなく、こんな感じで進めていけば解析を進めていける、という雰囲気はつかめたでしょうか……。

実はいろいろ非対応の部分があり、コメントアウトが処理できていなかったりします。このあたりは//とか/**/とかのルールを追加していけば処理できるようになるでしょう。きっと。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up