rubyでparsec #Ruby

rubyで超簡素ですが形だけ、Monad,Applicative風仕立てです。
 Gist

type Parser

type Parser a = String -> Result Error (a, String)

parser combinatorの基本は文字列を受け、失敗ならErrorを
成功なら処理した文字列と未処理の文字列に分けOKで包み返す、
関数を作り、組み合わせていくだけです。
Tupleの最初、aにParserの返り値を。二つ目に未処理の文字列を。
（実際のhaskell-parsecはエラーやパーサーの状態を管理するため、複雑です。）

さてruby

parsec.rb

class Parsec
  require "./result.rb"
  attr :parser # Proc(String -> Result Error [a, String])
  def initialize &p
    @parser = p.curry
  end

  def run s
    @parser.call s
  end

後々、演算子オーバーロードしたいのでObjectとして扱っていきます。
Tupleの代わりにArray。curryはおまじない。runは便宜上。
initializeをスペルミスして初期化されなくてハマるなどしました。

Result

data Result e a = OK a | Error e

しれっとrequire "./result.rb"してますが、まだrubyのモジュール分割流儀知らんので単一ファイルです。
MaybeでもEitherでもなくResultなのはRust風。
超簡素なのでMaybeでもよかったね。

result.rb

class Result
  attr_accessor :ok, :error
  def initialize(ok: nil, error: nil)
    if ok
      @ok = ok
    else
      @error = error
    end
  end

  def inspect
    if is_ok?
      "OK(#{@ok.inspect})"
    else
      "Error(#{@error.inspect})"
    end
  end

  def is_ok?
    @ok != nil
  end
end

Functor,Applicative,Monad風

Functor fmap

fmapは関数を取り、parserの結果がokであれば、それに関数を適応する、Parserを作ります。

parsec.rb

  # fmap :: (a -> b) -> Parsec a -> Parsec b
  def fmap &f
    Parsec.new do |s|
      r = run s
      if r.is_ok?
        Result.new ok: [f.call(r.ok[0]), r.ok[1]]
      else
        r
      end
    end
  end

Applicative app

app（haskelではapないし<*>）は関数を結果として返すParserを最初の引数（self）に、二つ目の引数に普通のParserを。二つをrunして二つともokなら、最初のParserから得た関数に、二つ目のParserの結果を適応するParserを作ります。
関数型言語に慣れてない人は「関数を返すParserとは一体？」となるかも。
appはApplicativeのきもなのです。

parsec.rb

  # app :: Parsec (a -> b) -> Parsec a -> Parsec b
  def app a
    Parsec.new do |s|
      r = run s
      if r.is_ok?
        rr = a.run r.ok[1]
        if rr.is_ok?
          Result.new ok: [r.ok[0].call(rr.ok[0]), rr.ok[1]]
        else
          rr
        end
      else
        r
      end
    end
  end

Monad bind

みんな大好きMonad。
bind（>>=）は１つ目の引数（self）にParserを、２つ目の引数に最初のParserの結果を取り、新たなParserを返す関数を取ります。
で、最初のParserがrun(s).is_ok?なら第二引数の関数に結果を適応し、帰ってくるParserを実行。今度は結果をチェックせずそのまま返します。
ってゆうParserを作るわけです。

parserc.rb

  # bind :: Parser a -> (a -> Parser b) -> Parser b
  def bind &f
    Parsec.new do |s|
      r = run s
      if r.is_ok?
        f.call(r.ok[0]).run(r.ok[1])
      else
        r
      end
    end
  end

Applicative, Alternative

Applicative appl appr

Applicative styleを実現するにはまだ、appl(<*), appr(*>)が必要です。
Parserを二つ（一つはself）取り両方実行し、片方の結果、applは左、apprは右、を返すParserを作ります。

parserc.rb

  # appl :: Parser a -> Parser b -> Parser a
  def appl a
    Parsec.new do |s|
      r = run s
      if r.is_ok?
        rr = a.run r.ok[1]
        if rr.is_ok?
          r.ok[1] = rr.ok[1]
          r
        else
          rr
        end
      else
        r
      end
    end
  end

  # appr :: Parser a -> Parser b -> Parser b
  def appr a
    Parsec.new do |s|
      r = run s
      if r.is_ok?
        a.run r.ok[1]
      else
        r
      end
    end
  end

Alternative or, many, some

AlternativeはApplicativeのサブクラスです。
ここではor(<|>),many,someを定義します。

これらは正規表現でいうなら、
or |
many *
some +
という具合です。

parserc.rb

  # or :: Parser a -> Parser a -> Parser a
  def or a 
    Parsec.new do |s|
      r = run(s)
      if r.is_ok?
        r
      else
        a.run(s)
      end
    end
  end

  # many :: Parser a -> Parser [a]
  def many
    self.or(Parsec::pure([])).bind do |x|
      if x == []
        Parsec::pure []
      else
        many.bind do |xs|
          Parsec::pure(xs.unshift x)
        end
      end
    end 
  end

  # some :: Parser a -> Parser [a]
  def some
    bind do |x|
      some.or(Parsec::pure([])).bind do |xs|
        Parsec::pure(xs.unshift x)
      end
    end
  end

pure

pureを忘れてました。
pureは任意の返り値を返すParserを作ります。
文字列に対しては何もしません。
メソッドとしては扱えないので関数？として定義します。
（こういうのなんと呼ぶのでしょうかね？）

parsec.rb

  # pure :: a -> Parser a
  def Parsec::pure a
    Parsec.new do |s|
      Result.new ok: [a, s]
    end
  end

lazy

もう一つ便利関数を。
blockで包むことで遅延評価を可能とします。
最初これ忘れてて、無限ループしてました。

parsec.rb

  def Parsec::lazy &f
    Parsec.new do |s|
      f.call.run s
    end
  end

satisfy, char, str

ようやく文字列を直接扱うParserを作っていきます。
satisfyは関数で、charは一文字だけ、strは文字列を評価していきます。

parserc.rb

  def Parsec::satisfy &f
    Parsec.new do |s|
      if f.call s[0]
        Result.new ok: [s[0], s[1..-1]]
      else
        Result.new error: ("satisty: unexpect #{s[0]}")
      end
    end
  end

  def Parsec::char c
    Parsec::satisfy { |a| c == a }
  end

  def Parsec::str s
    Parsec::char(s[0]).bind do |c|
      Parsec::str s[1..-1]
  end

冗長です。

演算子オーバーロード

これがやりたかっただけの正月だった！
これで一気に雰囲気が漂います。やったね。

ruby演算子

演算子の優先順位

高い
::
[]
+(単項) ! ~
**
-(単項)
* / %
+ -
<< >>
&
| ^
> >= < <=
<=> == === != =~ !~
&&
||
.. ...
?:(条件演算子)
=(+=, -= ... )
not
and or
低い

再定義できる演算子(メソッド)

| ^ & <=> == === =~ > >= < <= << >>
+ - * / % ** ~ +@ -@ [] []= ` ! != !~

parsec.rb

  alias *  app
  alias >> appr
  alias << appl
  alias |  or
end

素晴らしい。aliasも良いし、演算子をメソッドという形で再定義できるrubyすごい。
しかし、haskellはさらに上手。というか頭おかしいレベル。
演算子自由に定義できるし、結合順位や左結合、右結合まで決められる。
頭おかしい。やったね。

Parserを使う。

Monad style

ようやく。
まずは、MonadとAlternativeだけでS式を。
Monad >> == Appplicative *>です。

test_sexpr_monad_parserc.rb

require "./parsec.rb"

P = Parsec

def sexpr
  id | list
end

def id 
  "1234567890qwertyuiopasdfghjklzxcvbnm".
    chars.reduce(P.char("_")) { |p,c| P.char(c) | p }.
    some.bind { |xs| P.pure(xs.reduce { |s,x| s + x }) }
end

def list
  P.char("(") >>
  P.lazy { sexpr }.bind { |e|
    spaces >> P.pure(e)
  }.many.bind { |e|
    P.char(")") >> P.pure(e)
  }
end

def spaces
  (P.char(" ") | P.char("\n")).many
end

p sexpr.run("(abc 0a_1 (1 2) ((3)))") # OK([["abc", "0a_1", ["1", "2"], [["3"]]], ""])
p sexpr.run("a0")                     # OK(["a0", ""])
p sexpr.run("(0 ((1) 2) 3)")          # OK([["0", [["1"], "2"], "3"], ""])

ゴッチャゴチャや。。。
まあ動いてるのでよし。
P = Parsecと書けたので一々Parser::と書かずにすみました。
それでもP.が必要です。無くしたい。

Applicative style

と言っても、*と<<が追加になったくらいです。

test_sexpr_applicative_parsec.rb

require "./parsec.rb"

P = Parsec

def sexpr
  id | list
end

def id 
  f = proc { |ss| ss.reduce(:+) }
  P.pure(f) * P.satisfy { |c| c =~ /\w|\d/ }.some
end

def list
  (P.char("(") >> (P.lazy { sexpr } << spaces).many << P.char(")"))
end

def spaces
  (P.char(" ") | P.char("\n")).many
end

p sexpr.run("(abc 0a_1 (1 2) ((3)))") # OK([["abc", "0a_1", ["1", "2"], [["3"]]], ""])
p sexpr.run("a0")                     # OK(["a0", ""])
p sexpr.run("(0 ((1) 2) 3)")          # OK([["0", [["1"], "2"], "3"], ""])

スッキリー。

サンプル追加、四則演算。

test_4expr_parsec.rb


require "./parsec.rb"

P = Parsec

def expr
  add
end

def ast op
  proc { |a,b| [op,a,b] }.curry
end

def chainl1 a, op
  go = proc { |x|
    op.bind { |f|
      a.bind { |y|
        go.call(f.call(x,y))
      }
    } | P.pure(x)
  }
  a.bind { |x|
    go.call(x)
  }
end

def add
  chainl1(
    P.lazy { mul },
    P.char("+") >> P.pure(ast "+") | P.char("-") >> P.pure(ast "-")
  )
end

def mul
  chainl1(
    P.lazy { num }, 
    P.char("*") >> P.pure(ast "*") | P.char("/") >> P.pure(ast "/")
  )
end

def num
  f = proc { |a| a.reduce(:+) }
  P.pure(f) * ("123456789".chars.reduce(P.char("0")) { |p,n| P.char(n) | p }.some) |
  P.char("(") >> (P.lazy { expr }) << P.char(")")
end

p expr.run("1*2+3-4/5")
# OK([["-", ["+", ["*", "1", "2"], "3"], ["/", "4", "5"]], ""])
p expr.run("1*(2+(3-4))/5")
# OK([["/", ["*", "1", ["+", "2", ["-", "3", "4"]]], "5"], ""])

動きました。

雰囲気だけでも楽しめたかと思います。
ただ超簡素なのでまともにエラー表示できず、デバッグが辛かった。
まともに使っても、非効率で使い物にならないでしょう。
MonadもApplicativeも雰囲気だけです。構文糖的に。ご愛嬌。