More than 5 years have passed since last update.

Scala: パーサコンビネータで行数・文字数を取得しながらパースする

Scala

Posted at 2014-06-04

Scalaパーサコンビネータのデフォルトの振る舞いでは、パースしたトークンの何行目・何文字目といった位置情報は捨てられてしまいます。この記事では、Positional trait を使って位置情報を含んだトークンの取得方法を解説します。

パーサコンビネータの基本的な使い方は、面倒くさいパーサの実装もDSLで書くだけ！そう、Scalaならね - Qiitaをご覧ください。

実装のポイント

1. `Positonal` traitをインポートしておく

位置情報をトークンに持たせるには Positonal traitが必要になるので、インポートしておきます。

import scala.util.parsing.input.Positional

2. トークンのcase classを作っておく

トークンのcase classを作るのですが、その際ベースとなる sealed trait を作り Positonal trait を継承しておきます。更に、case classはベースとなるそのtraitを継承するようにします。こうすることで、パターンマッチングの漏れが無くなります。

sealed trait Nucleobase extends Positional
case class G() extends Nucleobase
case class A() extends Nucleobase
case class T() extends Nucleobase
case class C() extends Nucleobase

3. `positioned` メソッドでパース結果を囲む

位置情報を必要としないパーサでは、次のコードのように def パターン ^^ 関数 のように定義していきますが、位置情報を含む場合は positioned メソッドで各定義を囲む必要があります。

object DNAParser extends RegexParsers {
  def g = "G" ^^ { case _ => G() }
  …
}

次のコードが定義を positoned で囲んだコードの例です。この positoned は def パターン ^^ 関数 の戻り値である Parser オブジェクトに位置情報を付加します。

object DNAParser extends RegexParsers {
  def g = positioned("G" ^^ { case _ => G() })
  ...
}

以上が実装のポイントです。

コードの全体像

このポイントを踏まえて実装したコードの全体像は次に示すようになります。

import scala.util.parsing.combinator._
import scala.util.parsing.input.Positional

sealed trait Nucleobase extends Positional
case class G() extends Nucleobase
case class A() extends Nucleobase
case class T() extends Nucleobase
case class C() extends Nucleobase

object DNAParser extends RegexParsers {
  def g = positioned("G" ^^ { case _ => G() })
  def a = positioned("A" ^^ { case _ => A() })
  def t = positioned("T" ^^ { case _ => T() })
  def c = positioned("C" ^^ { case _ => C() })
  def expr = (g | a | t | c).*
  def apply(input: String): Either[String, List[Nucleobase]] = parseAll(expr, input) match {
    case Success(nucleobases, _) => Right(nucleobases)
    case NoSuccess(msg, next)    => Left(s"$msg on line ${next.pos.line} on column ${next.pos.column}")
  }
}

DNAParser("AAGA\nCGAT") match {
  case Right(nucleobases) => nucleobases.foreach(n => println(s"$n line: ${n.pos.line}, column: ${n.pos.column}"))
  case Left(errorMessage) => println(errorMessage)
}                                               //> A() line: 1, column: 1
                                                //| A() line: 1, column: 2
                                                //| G() line: 1, column: 3
                                                //| A() line: 1, column: 4
                                                //| C() line: 2, column: 1
                                                //| G() line: 2, column: 2
                                                //| A() line: 2, column: 3
                                                //| T() line: 2, column: 4

参考文献

Scaladoc 2.10.3 - scala.util.parsing.input.Positional

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up