論理式パーサをscala-parser-combinatorsで楽々実装

Scala

Last updated at 2018-07-18Posted at 2017-12-02

はじめに

この記事はMicroAd Advent Calendar 2017の2日目の記事です．

Scalaでは構文解析器を生成するためのパーサコンビネータを利用できます．構文解析器とは雑に説明しますと**「文字列からインスタンスを生成」**することです．
scala-parser-combinators(https://github.com/scala/scala-parser-combinators) を使うとeBNF(拡張バッカス・ナウア記法)という記法で定義するだけで，構文解析器を作れます．少ないコード量で記述できることが魅力的です．
そこで，本記事では論理式のパースをテーマにscala-parser-combinatorsを紹介しようと思います．ここで説明する論理式とは例えば以下のような文字列です．

1 and 2: 1かつ2
1 or 2: 1または2
1 and not 2: 1かつ2でない
1 and (2 or 3): 1かつ2または1かつ3

※論理演算子の優先順位は not > and > orです．notが一番優先度が高く，orが一番優先度が低くなります．

完成後の利用イメージ

イメージしやすいよう，先にクライアント側のコードを示します．

val set = Set(1L,2L,3L) // 1,2,3のセットが以下の条件に合致するかを確かめます

val condition1 = ConditionParser.parse("""1 and 2""") // 文字列を解析してConditionインスタンスを生成
condition1.get.apply(set) // condition1とsetを比較して条件判定
> true

val condition2 = ConditionParser.parse("""1 and 4""")
condition2.get.apply(set)
> false

ConditionParserは今回用意する構文解析器です．
parse()で構文解析してParseResult型を返却します．(解析に失敗した場合に条件分岐できるように)
コード簡略化のためgetを使っていますが，本来はmatch式でSuccessとFailureを場合分けします
apply()で条件判定を行います．これはparserとは関係なく，これから定義するConditionクラスの実装によります

まずはeBNFを定義

eBNFは抵抗ある方も多いと思いますが，今回のケースは複雑ではないため抵抗ある方でも解読できるかと思います．
「論理式パーサ」のeBNFは以下のようになりました． (eBNFとは繰り返し処理が書けるBNFのこと)

expr ::= term {"or" term } # exprはtermと`or term`の繰り返しだよ
term ::= not_term { "and" not_term} # termはnot_termと`and not_term`の繰り返しだよ
not_term :: = ["not"] factor # not_termはfactorだよ．`not`を付けても良いよ
factor ::= 整数 | "(" expr ")" # factorは整数か (expr)のどっちかだよ

あとはこのeBNFをScalaで定義するだけです．

構文解析器の実装

ConditionParser.scala

import scala.util.parsing.combinator._

object ConditionParser extends JavaTokenParsers {

  def expr: Parser[Condition] = term~rep("or"~>term) ^^ {
    case f~fs => fs.foldLeft(f) { (f1, f2) => f1 or f2 }
  }

  def term: Parser[Condition] = not_term~rep("and"~>not_term) ^^ {
    case f~fs => fs.foldLeft(f) { (f1, f2) => f1 and f2 }
  }

  def not_term: Parser[Condition] = opt("not")~factor ^^ {
    case Some("not")~f => NotCondition(f)
    case None~f => f
  }

  def factor: Parser[Condition] = (
    wholeNumber ^^ {str => TargetElement(str.toLong)}
    | "("~>expr<~")"
    )

  def parse(input: String): ParseResult[Condition] = parseAll(expr, input)
}

構文解析器を生成するときはJavaTokenParsersを継承します．（整数や文字列といった基本的な表現を定義済みで便利）
Conditionクラスは後述しますが，僕が勝手に作った論理式を示すクラスです．

と，なにやらよくわからない(~とか^^ってナニソレ!)と思いますので，解説しようと思います．
まず，以下のコード部分を説明します．

ConditionParser.scalaの一部

  def expr: Parser[Condition] = term~rep("or"~>term) ^^ {
    case f~fs => fs.foldLeft(f) { (f1, f2) => f1 or f2 }
  }

term~rep("or"~>term)はeBNFのterm {"or" term }に相当します．
eBNFのスペース(区切り)はチルダで記述します．
eBNFの{...}はrep()で記述します．
"or"~>termというのは解析した結果の右側だけ，つまりterm部分だけを後続処理で使うことを意味します．
^^は構文^^解析後の処理といった使い方をします．
case f~fs => fs.foldLeft(f) { (f1, f2) => f1 or f2 }ではfが１つ目のtermを示し，fsが２つ目のtermを示しています．
fsはtermの繰り返しなため，コレクションになっています．fやfsの型が何かというと，def factorを見るとわかります．

def factor: Parser[Condition] = (
  wholeNumber ^^ {str => TargetElement(str.toLong)}
    | "("~>expr<~")"
  )

wholeNumberからConditionを継承したTargetElementというインスタンスを生成しています．つまり，先程のfやfsはTargetElement型となります．(正しくはCondition型)

と，他の部分も以上の説明で理解できるようになるかと思います．（多分・・・）
このようにeBNFをScalaで定義し，解析後の文字列からインスタンスを生成するように構文解析器を実装していきます．
詳細はコップ本などが詳しいです．

Conditionクラスの実装

続いて，パースした論理式を保持するクラスを定義します．

Condition

trait Condition {

  def apply(conditionIds: Set[Long]): Boolean

  def and(other: Condition): Condition = AndCondition(this, other)

  def or(other: Condition): Condition = OrCondition(this, other)

  def not: Condition = NotCondition(this)
}

case class AndCondition(c1: Condition, c2: Condition) extends Condition {

  override def apply(conditionIds: Set[Long]): Boolean = {
    c1(conditionIds) && c2(conditionIds)
  }
}

case class OrCondition(c1: Condition, c2: Condition) extends Condition {

  override def apply(conditionIds: Set[Long]): Boolean = {
    c1(conditionIds) || c2(conditionIds)
  }
}

case class NotCondition(c: Condition) extends Condition {

  override def apply(conditionIds: Set[Long]): Boolean = {
    !c(conditionIds)
  }
}

case class TargetElement(id: Long) extends Condition {
  override def apply(conditionIds: Set[Long]) = {
    conditionIds.contains(id)
  }
}

TargetElementを最小単位として，それらをandやorなどの条件で再帰的に保持するような仕組みです．
applyに比較したいSetを渡すことで，条件判定できるようになっています．
こちらは本題とは逸れるので，この程度の説明で割愛したいと思います．（雑・・）

まとめ

scala-parser-combinatorsの紹介と論理パーサを実装を紹介しました．
論理パーサのコード全体を以下にのgistに置きましたのでもし必要であればご活用下さい．

https://gist.github.com/mattsu6/137029a81f68e3bdb7a4de7ce8d52333

それでは，読んで頂いてありがとうございました．
誤字脱字を始め不正確な内容がございましたら遠慮なくご指摘頂けると幸いです．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up