More than 5 years have passed since last update.

ScalaのRegexParsersで書いたパーサのチューニング方法 #2

Scala

Last updated at 2014-07-13Posted at 2014-07-13

例

適当なパーサを2つ例に取ります。

Parser.scala

//適当なCSVパーサの定義
//lazy val field = "[a-z]+".r
lazy val A = field ~ ( rep( "," ~ field ) ).? 
lazy val B = ( rep( field ~ "," ) ).? ~ field

AもBも意図するところは同じです。a、a,b、a,b,cのような入力を、CSVの行としてパースします。
このAとBを見て、どちらがより優れたパーサなのかすぐに分かりますか？　答えはAなのですが、なかなかパッと見ただけでは判断し辛いです。

入力文字列へのアクセス回数でパーサを評価する

適当な入力に対する実行時間を計測するのもよいのですが、別の手法として、入力した文字列へのアクセス回数を数えるという方法だと、バックグラウンドのプロセスやCPUクロックの変動などのノイズの影響を受けず、高速にスコアを求められます。

もちろん、実際のパフォーマンスと差異がありますが、Parserのパフォーマンスは、概ねこのスコアに比例します。

ParserTest.scala


var count = 0
class DebugCharSequence(seq: CharSequence) extends java.lang.CharSequence {
  def charAt(n: Int) = { count += 1; seq.charAt(n) }
  def subSequence(s: Int, e: Int) = new DebugCharSequence(seq.subSequence(s, e))
  def length = seq.length
  override def toString = seq.toString
}
val dseq = new DebugCharSequence("a,b,c")
parser.parse(parser.A, dseq)
println("access count: A", count)
count = 0
parser.parse(parser.B, dseq)
println("access count: B", count)

(access count: A,31)
(access count: B,40)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up