Crystal本体のコードを眺める字句解析編 #Crystal

この記事は Crystal Advent Calendar 2018 の25日目の記事です。

23日の記事の続きやっていきます。
今回は字句解析の箇所を読んでいきます。

compile中の

private def parse(program, source : Source)
  parser = Parser.new(source.code, program.string_pool)
  parser.filename = source.filename
  parser.wants_doc = wants_doc?
  parser.parse

の実態はcompiler/crystal/syntax/parser.crにあります。
中身を抜粋すると

module Crystal
  class Parser < Lexer
    def initialize(str, string_pool : StringPool? = nil, @def_vars = [Set(String).new])
      super(str, string_pool)

とあります。継承階層からLexer（字句解析）を行っているようです。
改めてparseの中身を追います。

    def parse
      next_token_skip_statement_end

      expressions = parse_expressions.tap { check :EOF }

      check :EOF

      expressions
    end

next_token_skip_statement_endはlexerの中で定義されています。

compiler/crystal/syntax/lexer.crを見ます。

    def next_token_skip_statement_end
      next_token
      skip_statement_end
    end

    def next_token

ここから字句解析を行っていきます。

      # Skip comments
      while current_char == '#'
        start = current_pos

        # Check #<loc:...> pragma comment
       (ry
      end

先頭が#で始まっている箇所はコメント処理します。（合わせてその中でpragma判定も行っています）

      case current_char
      when '\0'
        @token.type = :EOF
      when ' ', '\t'
        consume_whitespace
        reset_regex_flags = false
      when '\\'
(ry

長いので省略しますが、やっていることは基本的に文字列終端判定、タブ文字、復帰文字、改行文字、予約後の判定を行っていきます。
複数の文字で構成される演算子、例えば<<や<<=等の場合<で始まっている場合の判定から、その次の文字の判定を行って判別していきます。

抜粋すると以下のような判定を行っています。

      when '<'
        case next_char
        when '='
          case next_char
          when '>'
            next_char :"<=>"
          else
            @token.type = :"<="
          end
        when '<'
          case next_char
          when '='
            next_char :"<<="

コードを追っていくと処理が複雑なのでspecから実際に得られる値を見ると以下のようになっています。

spec/compiler/parser/parser_spec.cr

private def it_parses(string, expected_node, file = __FILE__, line = __LINE__)
  it "parses #{string.dump}", file, line do
    parser = Parser.new(string)
    parser.filename = "/foo/bar/baz.cr"
    node = parser.parse

    # If it's an Array, map it all to ASTNode (the array might be of a
    # union that's not exactly ASTNode). Not having to write `[...] of ASTNode`
    # simplifies testing a bit.
    local_expected_node = expected_node
    if local_expected_node.is_a?(Array)
      local_expected_node = local_expected_node.map(&.as(ASTNode))
    end

    node.should eq(Expressions.from(local_expected_node))
  end
end

上記のヘルパーだと、第一引数に解析対象のコード文字列、第二引数に解析結果のASTNodeを渡します。

まず数値の定義だと以下の通り。

    it_parses "1", 1.int32
    it_parses "+1", 1.int32
    it_parses "-1", -1.int32

次に算術演算は

    it_parses "1 * 2", Call.new(1.int32, "*", 2.int32)
    it_parses "1 * -2", Call.new(1.int32, "*", -2.int32)

メソッド定義だと

    it_parses "def foo\n1\nend", Def.new("foo", body: 1.int32)

最後にクラス定義は以下の期待値となります、。

    it_parses "class Foo; end", ClassDef.new("Foo".path)
    it_parses "class Foo\nend", ClassDef.new("Foo".path)
    it_parses "class Foo\ndef foo; end; end", ClassDef.new("Foo".path, [Def.new("foo")] of ASTNode)

このあたりの処理はまだ触りの部分なのでもうちょっと調べたあとに続編書きます。

Crystal本体のコードを眺める 字句解析編

Crystal本体のコードを眺める字句解析編