Parsecのソースちら見(2) #Haskell

動きから推測すると最もパースが進んだ状態で起こったエラーを最終のエラーとすると思われます。
https://groups.google.com/forum/#!msg/haskell-jp/ZST2IRPXFPs/vS9FpR5_m2oJ

予想通りの内容。

Text-Parsec-Error.html

mergeError :: ParseError -> ParseError -> ParseError
mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2)
    -- prefer meaningful errors
    | null msgs2 && not (null msgs1) = e1
    | null msgs1 && not (null msgs2) = e2
    | otherwise
    = case pos1 `compare` pos2 of
        -- select the longest match
        EQ -> ParseError pos1 (msgs1 ++ msgs2)
        GT -> e1
        LT -> e2

>>= はこんな感じ。めんどくさそうに見えるけど2つのパーサー走らせてくっつけてるだけ。最初のパーサが入力を消費した(consume)場合は、当然後ろのパーサが入力を消費しない(empty)場合でもconsume扱いとなるので、cokやcerrの継続を通して結果を返さないとならない。後はmergeErrorをするか否かだが、これはなんでこうしているか判断しかねた。予想では、後続のパーサがconsumeであれば先発のパーサより処理が進んでいるので、マージするまでもなく後続のパーサのエラーを返せば計算量を節約できていいってことかなと思ったのだけど、それだとメッセージがnullのエラーをきちんと評価してない気もするんだがいいんだろうか。

Text-Parsec-Prim.html

parserBind :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b
{-# INLINE parserBind #-}
parserBind m k
  = ParsecT $ \s cok cerr eok eerr ->
    let
        -- consumed-okay case for m
        mcok x s err =
            let
                 -- if (k x) consumes, those go straigt up
                 pcok = cok
                 pcerr = cerr

                 -- if (k x) doesn't consume input, but is okay,
                 -- we still return in the consumed continuation
                 peok x s err' = cok x s (mergeError err err')

                 -- if (k x) doesn't consume input, but errors,
                 -- we return the error in the 'consumed-error'
                 -- continuation
                 peerr err' = cerr (mergeError err err')
            in  unParser (k x) s pcok pcerr peok peerr                      

        -- empty-ok case for m
        meok x s err =
            let
                -- in these cases, (k x) can return as empty
                pcok = cok
                peok x s err' = eok x s (mergeError err err')
                pcerr = cerr
                peerr err' = eerr (mergeError err err') 
            in  unParser (k x) s pcok pcerr peok peerr
        -- consumed-error case for m
        mcerr = cerr

        -- empty-error case for m
        meerr = eerr

    in unParser m s mcok mcerr meok meerr

<|>はemptyなエラーだけ拾って後続のパーサを走らせる。こちらも後続がemptyな場合だけエラーメッセージをマージするので、後続にtryが着いてるかどうかで優先されるエラーの出力が変わる可能性がある。

Text-Parsec-Prim.html

parserPlus :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a
{-# INLINE parserPlus #-}
parserPlus m n
    = ParsecT $ \s cok cerr eok eerr ->
      let
          meerr err =
              let
                  neok y s' err' = eok y s' (mergeError err err')
                  neerr err' = eerr $ mergeError err err'
              in unParser n s cok cerr neok neerr
      in unParser m s cok cerr eok meerr

<?>はemptyな場合にエラーメッセージを加えている。なんでconsumeだと加えないのかはよくわからず。errorIsUnknownは、まだエラーが発生してない時にメッセージを追加しないようにするために必要。

Text-Parsec-Prim.html

labels :: ParsecT s u m a -> [String] -> ParsecT s u m a
labels p msgs =
    ParsecT $ \s cok cerr eok eerr ->
    let eok' x s' error = eok x s' $ if errorIsUnknown error
                  then error
                  else setExpectErrors error msgs
        eerr' err = eerr $ setExpectErrors err msgs

    in unParser p s cok cerr eok' eerr'

 where
   setExpectErrors err []         = setErrorMessage (Expect "") err
   setExpectErrors err [msg]      = setErrorMessage (Expect msg) err
   setExpectErrors err (msg:msgs)
       = foldr (\msg' err' -> addErrorMessage (Expect msg') err')
         (setErrorMessage (Expect msg) err) msgs

tryは単純明快。cerr、eerrのどちらの継続も状態を受け取らないので、入力を復元したりする必要もない。入力の復元はよく見直すと<|>がやってる(同じsで二回パーサを走らせている)。

Text-Parsec-Prim.html

try :: ParsecT s u m a -> ParsecT s u m a
try p =
    ParsecT $ \s cok _ eok eerr ->
    unParser p s cok eerr eok eerr

一方でlookAheadは、eok、cokの継続に渡ってくる状態がパーサが走った後のものになってしまうので、走らせる前の状態を記録しておいて復元する必要がある。入力は消費しないことになってるので、仮に入力を消費した場合でも呼ぶ継続はeokの方となる。

Text-Parsec-Prim.html

lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
lookAhead p         = do{ state <- getParserState
                        ; x <- p'
                        ; setParserState state
                        ; return x
                        }
    where
    p' = ParsecT $ \s cok cerr eok eerr ->
         unParser p s eok cerr eok eerr