More than 1 year has passed since last update.

Qiita株式会社

RubyVM::AbstractSyntaxTreeについて調べてみた

Ruby

Last updated at 2023-04-14Posted at 2023-04-14

社内で行っているRubyの勉強会のネタを探しに3.2.0のリリースノートを見ていたところ、RubyVM::AbstractSyntaxTreeの変更についての記載があり、これについて調べたのでまとめました。

parse, parse_file, of へ error_tolerant オプションが追加されました

parse, parse_file, of へ keep_tokensオプションが追加されました

RubyVM::AbstractSyntaxTreeについてまとめますが、このモジュールは実験的であり、一般に使用が推奨されているものではありません

RubyVM::AbstractSyntaxTreeについて

RubyVM::AbstractSyntaxTree はRuby2.6で入ったRubyのパーサーで、Ruby本体のデバッグや静的解析等で使うことを想定されています。

Rubyのパーサーとしては、他にはRipper(1.9で標準ライブラリに入った)やparser gemがよく使われるようです。

ちなみに、rubocopはもともとRipperを使っていたようですが、現在はparser gemを使っています。¹
Ruby 3.2で標準ライブラリに入った syntax_suggest も内部でRipperをつかっています。
(syntax_suggestについては RubyKaigi 2021のセッションが面白かったのでぜひ見てみてください)

MRIにしか実装されていない

Rubyにはいくつか処理系があるのですが、RubyVM::AbstractSyntaxTreeはMRIにしか実装されていません。

そもそも、RubyVMクラス自体MRIのみのようですし、以下のように一般ユーザーが使うことは想定されていません。

class RubyVM (Ruby 3.2 リファレンスマニュアル)

Ruby の内部情報へのアクセス手段を提供するクラスです。デバッグ用、プロトタイピング用、研究用などのとても限定された用途向けです。一般ユーザーは使うべきではありません

module RubyVM::AbstractSyntaxTree (Ruby 3.2 リファレンスマニュアル)

このモジュールは実験的であり、安定したAPIではないため、予告なしに変更される可能性があります

こちら(Feature #14844: Future of RubyVM::AST? - Ruby master - Ruby Issue Tracking System)のIssueでは、RubyVM::AbstractSyntaxTreeについて、「Rubyとしてこのライブラリをどうしていくか」であったり、「Ripper等の他のパーサーとの比較」などが議論されています。
より深く知りたい人はこちらのIssueを読んでみてください。

ちなみに、RubyVMモジュールには他に以下のようなモジュールがあり、

# ruby -v => 3.2.1

> RubyVM.constants
=> [:INSTRUCTION_NAMES, :InstructionSequence, :OPTS, :DEFAULT_PARAMS, :MJIT, :AbstractSyntaxTree, :YJIT]
# InstructionSequence(ISeq)は命令列のバイトコード

RubyVM::AbstractSyntaxTreeには以下のようなメソッドが定義されています。

> RubyVM::AbstractSyntaxTree.methods(false)
=> [:node_id_for_backtrace_location, :parse, :of, :parse_file]

RubyVM::AbstractSyntaxTreeが追加された背景

追加された背景はRubyKaigi 2018の Yuichiro Kaneko さんの発表で話されていました。

Ruby2.5で実装されたbranch coverageを実装するために、Code locationが取得できるような変更が加わりました。
今までは行情報は取れていたものが、columの情報がとれるようになりました。 (beginとendの情報)

実行結果

$ ruby --dump=p -e '"str".upcase'                                                                                                                                                                    [main]
###########################################################
## Do NOT use this node dump for any purpose other than  ##
## debug and research.  Compatibility is not guaranteed. ##
###########################################################

# @ NODE_SCOPE (id: 2, line: 1, location: (1,0)-(1,12))
# +- nd_tbl: (empty)
# +- nd_args:
# |   (null node)
# +- nd_body:
#     @ NODE_CALL (id: 1, line: 1, location: (1,0)-(1,12))*
#     +- nd_mid: :upcase
#     +- nd_recv:
#     |   @ NODE_STR (id: 0, line: 1, location: (1,0)-(1,5))
#     |   +- nd_lit: "str"
#     +- nd_args:
#         (null node)

token (v3.2)

$  ruby --dump=y -e '1 + 2' | grep 'Shifting'
Shifting token "integer literal" (1.0-1.1: 1)
Shifting token '+' (1.2-1.3: )
Shifting token "integer literal" (1.4-1.5: 2)
Shifting token '\n' (1.5-1.6: )
Shifting token "end-of-input" (1.6-1.6: )

Code locationはRuby3.1のerror_highlightにも利用されています。

そして、このCode locationのCのAPIをtestするためにext/-test-/ast/ast.cに実装されたのですが、公開すると役に立つ人もいるかも? という話になり2.6.0-preview2で公開されたようです。
なので、元々この機能を作りたかったというよりは、副次的に作られたもので、今後の具体的な開発計画があるわけではないようです。

　【補足】 Rubyコードのパースについて

3.2の変更内容に入る前に、前提情報としてRubyコードを実行するまでのパースについて補足します。

Rubyのコードは、以下のように処理、変換されます。

字句解析でTokenに
構文解析でASTに
ASTをInstructionSequence (ISeq/バイトコード/YARV命令列)に
(JITコンパイル)

パーサーについては、parse.yを元に、Bisonを使ってparse.cが生成されます。²

BisonはLALRパーサジェネレータです。Yacc(Yet Another Compiler Compiler)形式の.y拡張子を使って記述します。パーサーと言っていますが、字句解析も行います。

Ruby3.2での変更内容について

開発者の Yuichiro Kaneko さんのブログにまとめられています。

RubyVM::AbstractSyntaxTree parse, parse_file, of へ error_tolerant オプションが追加されました
- Feature #19013: Error Tolerant Parser - Ruby master - Ruby Issue Tracking System
  - (Issueを見ると、一種の実験的な意味合いもありそう)
- [Feature #19013] Error Tolerant Parser by yui-knk · Pull Request #6512 · ruby/ruby
RubyVM::AbstractSyntaxTree parse, parse_file, of へ keep_tokensオプションが追加されました
- Feature #19070: Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods - Ruby master - Ruby Issue Tracking System
- [Feature #19070] Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods by yui-knk · Pull Request #6770 · ruby/ruby

error_tolerantはSyntax Errorがあったときでも、一部のASTを取得できるようにするオプションです。
実装中、基本的にはプログラムは不完全な状態で、Syntaxは正しくない時間が大半です。その状態でもASTが取得できることでLSP等でコードの補完を行うことなどに活かすこともできるようになります。

keep_tokensはNodeからtokenを取得できるようにするオプションです。 Issueで例示されているもののように、ASTの段階では抜け落ちていたtokenの情報が取れるようになるようです。

Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both m(1) and m(1, ) has same AST structure other than node locations then it's impossible to check the existence of , from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case.

RubyVM::AbstractSyntaxTreeはつくられた経緯からわかるように、パースにはparser.yの処理が使われています。

ソースコード一部抜粋

ast.rb

  #  call-seq:
  #     RubyVM::AbstractSyntaxTree.parse(string, keep_script_lines: false, error_tolerant: false, keep_tokens: false) -> RubyVM::AbstractSyntaxTree::Node
  #
  #  Parses the given _string_ into an abstract syntax tree,
  #  returning the root node of that tree.
  #
  #    RubyVM::AbstractSyntaxTree.parse("x = 1 + 2")
  #    # => #<RubyVM::AbstractSyntaxTree::Node:SCOPE@1:0-1:9>
  #
  #  If <tt>keep_script_lines: true</tt> option is provided, the text of the parsed
  #  source is associated with nodes and is available via Node#script_lines.
  #
  #  If <tt>keep_tokens: true</tt> option is provided, Node#tokens are populated.
  #
  #  SyntaxError is raised if the given _string_ is invalid syntax. To overwrite this
  #  behavior, <tt>error_tolerant: true</tt> can be provided. In this case, the parser
  #  will produce a tree where expressions with syntax errors would be represented by
  #  Node with <tt>type=:ERROR</tt>.
  #
  #     root = RubyVM::AbstractSyntaxTree.parse("x = 1; p(x; y=2")
  #     # <internal:ast>:33:in `parse': syntax error, unexpected ';', expecting ')' (SyntaxError)
  #     # x = 1; p(x; y=2
  #     #           ^
  #
  #     root = RubyVM::AbstractSyntaxTree.parse("x = 1; p(x; y=2", error_tolerant: true)
  #     # (SCOPE@1:0-1:15
  #     #  tbl: [:x, :y]
  #     #  args: nil
  #     #  body: (BLOCK@1:0-1:15 (LASGN@1:0-1:5 :x (LIT@1:4-1:5 1)) (ERROR@1:7-1:11) (LASGN@1:12-1:15 :y (LIT@1:14-1:15 2))))
  #     root.children.last.children
  #     # [(LASGN@1:0-1:5 :x (LIT@1:4-1:5 1)),
  #     #  (ERROR@1:7-1:11),
  #     #  (LASGN@1:12-1:15 :y (LIT@1:14-1:15 2))]
  #
  #  Note that parsing continues even after the errored expression.
  #
  def self.parse string, keep_script_lines: false, error_tolerant: false, keep_tokens: false
    Primitive.ast_s_parse string, keep_script_lines, error_tolerant, keep_tokens
  end

ast.c

static VALUE
ast_s_parse(rb_execution_context_t *ec, VALUE module, VALUE str, VALUE keep_script_lines, VALUE error_tolerant, VALUE keep_tokens)
{
    return rb_ast_parse_str(str, keep_script_lines, error_tolerant, keep_tokens);
}

static VALUE
rb_ast_parse_str(VALUE str, VALUE keep_script_lines, VALUE error_tolerant, VALUE keep_tokens)
{
    rb_ast_t *ast = 0;

    StringValue(str);
    VALUE vparser = ast_parse_new();
    if (RTEST(keep_script_lines)) rb_parser_keep_script_lines(vparser);
    if (RTEST(error_tolerant)) rb_parser_error_tolerant(vparser);
    if (RTEST(keep_tokens)) rb_parser_keep_tokens(vparser);
    ast = rb_parser_compile_string_path(vparser, Qnil, str, 1);
    return ast_parse_done(ast);
}

parser.y

VALUE rb_parser_new(void)
{
    struct parser_params *p;
    VALUE parser = TypedData_Make_Struct(0, struct parser_params, &parser_data_type, p);
    parser_initialize(p);
    return parser;
}

void rb_parser_keep_script_lines(VALUE vparser)
{
    struct parser_params *p;

    TypedData_Get_Struct(vparser, struct parser_params, &parser_data_type, p);
    p->keep_script_lines = 1;
}

void rb_parser_error_tolerant(VALUE vparser)
{
    struct parser_params *p;

    TypedData_Get_Struct(vparser, struct parser_params, &parser_data_type, p);
    p->error_tolerant = 1;
    p->end_expect_token_locations = rb_ary_new();
}

今後

RubyVM::AbstractSyntaxTreeに具体的な開発計画があるわけではないようですが、Rubyのパーサー自体をerror tolerantに、よりポータブルに、より拡張しやすくする計画があるようです。

今年のRubyKaigi2023でも関連したセッションがあるので、興味のある方は見てみてください。

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up