こんな感じのコードを最近みました。
puts 'hello world!' .gsub('hello', 'good evening')
いやいや!スペース空きすぎでしょ、こんなんエラー起こしているでしょー
なんて思ってたら、なんか動いていたんですね。
上のコードだと、結果はこうなります。
puts 'hello world!' .gsub('hello', 'good evening')
# => good evening world!
なんだこれは、意味がわからん。僕の知っているRubyじゃない。ということで、字句解析と構文解析をしてみました。
require 'ripper'
require 'pp'
code =<<EOF
puts 'hello world!' .gsub('hello', 'good evenning')
EOF
puts '=====lexer====='
pp Ripper.lex(code)
puts '=====parser====='
pp Ripper.sexp(code)
結果はこちら
=====lexer=====
[[[1, 0], :on_ident, "puts"],
[[1, 4], :on_sp, " "],
[[1, 5], :on_tstring_beg, "'"],
[[1, 6], :on_tstring_content, "hello world!"],
[[1, 18], :on_tstring_end, "'"],
[[1, 19], :on_sp, " "],
[[1, 25], :on_period, "."],
[[1, 26], :on_ident, "gsub"],
[[1, 30], :on_lparen, "("],
[[1, 31], :on_tstring_beg, "'"],
[[1, 32], :on_tstring_content, "hello"],
[[1, 37], :on_tstring_end, "'"],
[[1, 38], :on_comma, ","],
[[1, 39], :on_sp, " "],
[[1, 40], :on_tstring_beg, "'"],
[[1, 41], :on_tstring_content, "good evenning"],
[[1, 54], :on_tstring_end, "'"],
[[1, 55], :on_rparen, ")"],
[[1, 56], :on_nl, "\n"]]
=====parser=====
[:program,
[[:command,
[:@ident, "puts", [1, 0]],
[:args_add_block,
[[:method_add_arg,
[:call,
[:string_literal,
[:string_content, [:@tstring_content, "hello world!", [1, 6]]]],
:".",
[:@ident, "gsub", [1, 26]]],
[:arg_paren,
[:args_add_block,
[[:string_literal,
[:string_content, [:@tstring_content, "hello", [1, 32]]]],
[:string_literal,
[:string_content, [:@tstring_content, "good evenning", [1, 41]]]]],
false]]]],
false]]]]
重要なのは5つ目のon_sp
のところではないでしょうか。
字句解析の結果、Rubyは'hello world!'
と.
の間のスペースをon_sp
というトークンとしてひとまとめにしています。
そしてその後、構文解析の結果でon_spが何かに置き換わった様子もありません。
(parserの8行目で'hello world!'
は:string_content
として、
gsub
は10行目で:@ident(identifier)
として解析されている。)
つまり、Rubyではスペースを字句解析の段階でトークンとトークンを分けるためだけに使っているのではないかと思いました。(間違っていたらごめんなさい。)
なので、結果的に、スペースがあってもなくても構文解析の結果は、、、
require 'ripper'
require 'pp'
code =<<EOF
puts 'hello world!'.gsub('hello', 'good evenning')
EOF
puts '=====lexer====='
pp Ripper.lex(code)
puts '=====parser====='
pp Ripper.sexp(code)
=====lexer=====
[[[1, 0], :on_ident, "puts"],
[[1, 4], :on_sp, " "],
[[1, 5], :on_tstring_beg, "'"],
[[1, 6], :on_tstring_content, "hello world!"],
[[1, 18], :on_tstring_end, "'"],
[[1, 19], :on_period, "."],
[[1, 20], :on_ident, "gsub"],
[[1, 24], :on_lparen, "("],
[[1, 25], :on_tstring_beg, "'"],
[[1, 26], :on_tstring_content, "hello"],
[[1, 31], :on_tstring_end, "'"],
[[1, 32], :on_comma, ","],
[[1, 33], :on_sp, " "],
[[1, 34], :on_tstring_beg, "'"],
[[1, 35], :on_tstring_content, "good evenning"],
[[1, 48], :on_tstring_end, "'"],
[[1, 49], :on_rparen, ")"],
[[1, 50], :on_nl, "\n"]]
=====parser=====
[:program,
[[:command,
[:@ident, "puts", [1, 0]],
[:args_add_block,
[[:method_add_arg,
[:call,
[:string_literal,
[:string_content, [:@tstring_content, "hello world!", [1, 6]]]],
:".",
[:@ident, "gsub", [1, 20]]],
[:arg_paren,
[:args_add_block,
[[:string_literal,
[:string_content, [:@tstring_content, "hello", [1, 26]]]],
[:string_literal,
[:string_content, [:@tstring_content, "good evenning", [1, 35]]]]],
false]]]],
false]]]]
同じですね。
[スペースめっちゃあり]
[:program,
[[:command,
[:@ident, "puts", [1, 0]],
[:args_add_block,
[[:method_add_arg,
[:call,
[:string_literal,
[:string_content, [:@tstring_content, "hello world!", [1, 6]]]],
:".",
[:@ident, "gsub", [1, 26]]],
[:arg_paren,
[:args_add_block,
[[:string_literal,
[:string_content, [:@tstring_content, "hello", [1, 32]]]],
[:string_literal,
[:string_content, [:@tstring_content, "good evenning", [1, 41]]]]],
false]]]],
false]]]]
[スペースなし]
[:program,
[[:command,
[:@ident, "puts", [1, 0]],
[:args_add_block,
[[:method_add_arg,
[:call,
[:string_literal,
[:string_content, [:@tstring_content, "hello world!", [1, 6]]]],
:".",
[:@ident, "gsub", [1, 20]]],
[:arg_paren,
[:args_add_block,
[[:string_literal,
[:string_content, [:@tstring_content, "hello", [1, 26]]]],
[:string_literal,
[:string_content, [:@tstring_content, "good evenning", [1, 35]]]]],
false]]]],
false]]]]
なので、先ほどの
puts 'hello world!' .gsub('hello', 'good evening')
は正常に動いていたのだと考えられます。
実験
他にも、スペースが必要な場面も解析してみました。
以下のコードを解析すると
[1,2].each do |e|
end
=====lexer=====
[[[1, 0], :on_lbracket, "["],
[[1, 1], :on_int, "1"],
[[1, 2], :on_comma, ","],
[[1, 3], :on_int, "2"],
[[1, 4], :on_rbracket, "]"],
[[1, 5], :on_period, "."],
[[1, 6], :on_ident, "each"],
[[1, 10], :on_sp, " "],
[[1, 11], :on_kw, "do"],
[[1, 13], :on_sp, " "],
[[1, 14], :on_op, "|"],
[[1, 15], :on_ident, "e"],
[[1, 16], :on_op, "|"],
[[1, 17], :on_ignored_nl, "\n"],
[[2, 0], :on_kw, "end"],
[[2, 3], :on_nl, "\n"]]
=====parser=====
[:program,
[[:method_add_block,
[:call,
[:array, [[:@int, "1", [1, 1]], [:@int, "2", [1, 3]]]],
:".",
[:@ident, "each", [1, 6]]],
[:do_block,
[:block_var,
[:params, [[:@ident, "e", [1, 15]]], nil, nil, nil, nil, nil, nil],
false],
[[:void_stmt]]]]]]
lexerの7つ目の要素でeach
をしっかりトークンにして、ASTにも変換できていますが、
each
とdo
の間にスペースを入れないと。。。
=====lexer=====
[[[1, 0], :on_lbracket, "["],
[[1, 1], :on_int, "1"],
[[1, 2], :on_comma, ","],
[[1, 3], :on_int, "2"],
[[1, 4], :on_rbracket, "]"],
[[1, 5], :on_period, "."],
[[1, 6], :on_ident, "eachdo"],
[[1, 12], :on_sp, " "],
[[1, 13], :on_op, "|"],
[[1, 14], :on_ident, "e"],
[[1, 15], :on_op, "|"],
[[1, 16], :on_ignored_nl, "\n"],
[[2, 0], :on_kw, "end"],
[[2, 3], :on_nl, "\n"]]
=====parser=====
nil
each
とdo
がくっついてeachdoになり、構文解析で失敗しているようです。