Qiita Teams that are logged in
You are not logged in to any team

Log in to Qiita Team
OrganizationEventAdvent CalendarQiitadon (β)
Qiita JobsQiita ZineQiita Blog
Help us understand the problem. What are the problem?

More than 3 years have passed since last update.

メモ: Ruby で文字列を XML の CDATA としてシリアライズする場合はどうすると速いのか?


]]> を含む文字列を CDATA として XML シリアライズする場合、 ]]> で二つの CDATA セクションに分ける、という認識。


(挙動から)Oga と REXML は CDATA の中に「なにもせず」テキストを投入している様に見える。これはよいのだろうか?

> Oga::XML::Cdata.new(text: '>a]>b]]>c]]]>d').to_xml
=> "<![CDATA[>a]>b]]>c]]]>d]]>"
> REXML::Document.new.tap { |doc| doc.add(REXML::CData.new('>a]>b]]>c]]]>d', true)) }.to_s
=> "<![CDATA[>a]>b]]>c]]]>d]]>"
> Nokogiri::XML::CDATA.new(Nokogiri::XML::Document.new, '>a]>b]]>c]]]>d').to_xml
=> "<![CDATA[>a]>b]]]]><![CDATA[>c]]]]]><![CDATA[>d]]>"

Oga と REXML は無視することにして、Nokogiri と gsub を比べてみる。split & join は論外。
]]> の数が少ないほど gsub 有利(x2)で、増えるにつれて差がなくなるが、言っても数十マイクロ秒同士。



require 'benchmark_driver'

Benchmark.driver do |x|
  x.prelude <<~RUBY
    require 'nokogiri'
    require 'oga'
    require 'rexml/document'

    def bench_nokogiri(value)
      Nokogiri::XML::CDATA.new(Nokogiri::XML::Document.new, value).to_xml

    def bench_oga(value)
      Oga::XML::Cdata.new(text: value).to_xml

    def bench_rexml(value)
      doc = REXML::Document.new
      doc.add(REXML::CData.new(value, true))

    def bench_gsub(value)
      "<![CDATA[\#{value.gsub(']]>', ']]]]><![CDATA[>')}]]>"

    def bench_split_map_join(value)
      value.split(/(?<=\\]\\])(?=>)/).map { |x| "<![CDATA[\#{x}]]>" }.join

    A = Array.new(1_000, 'あ').join(']]]').freeze
    B = Array.new(1_000 - 100, 'あ').join(']]]') + ']]]' + Array.new(100, 'あ').join(']]>').freeze
    C = Array.new(1_000 - 500, 'あ').join(']]]') + ']]]' + Array.new(500, 'あ').join(']]>').freeze
    D = Array.new(1_000, 'あ').join(']]>').freeze

  x.report 'Nokogiri A', %{ bench_nokogiri(A) }
  x.report 'Nokogiri B', %{ bench_nokogiri(B) }
  x.report 'Nokogiri C', %{ bench_nokogiri(C) }
  x.report 'Nokogiri D', %{ bench_nokogiri(D) }

  x.report 'Oga A', %{ bench_oga(A) }
  x.report 'Oga B', %{ bench_oga(B) }
  x.report 'Oga C', %{ bench_oga(C) }
  x.report 'Oga D', %{ bench_oga(D) }

  x.report 'REXML A', %{ bench_rexml(A) }
  x.report 'REXML B', %{ bench_rexml(B) }
  x.report 'REXML C', %{ bench_rexml(C) }
  x.report 'REXML D', %{ bench_rexml(D) }

  x.report 'String#gsub A', %{ bench_gsub(A) }
  x.report 'String#gsub B', %{ bench_gsub(B) }
  x.report 'String#gsub C', %{ bench_gsub(C) }
  x.report 'String#gsub D', %{ bench_gsub(D) }

  x.report 'String#split A', %{ bench_split_map_join(A) }
  x.report 'String#split B', %{ bench_split_map_join(B) }
  x.report 'String#split C', %{ bench_split_map_join(C) }
  x.report 'String#split D', %{ bench_split_map_join(D) }

ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17

Warming up --------------------------------------
          Nokogiri A    23.487k i/s
          Nokogiri B    18.810k i/s
          Nokogiri C    11.669k i/s
          Nokogiri D     7.744k i/s
               Oga A    57.971k i/s
               Oga B    58.786k i/s
               Oga C    58.772k i/s
               Oga D    60.551k i/s
             REXML A    19.553k i/s
             REXML B    19.436k i/s
             REXML C    19.734k i/s
             REXML D    19.391k i/s
       String#gsub A    51.944k i/s
       String#gsub B    30.070k i/s
       String#gsub C    12.351k i/s
       String#gsub D     6.905k i/s
      String#split A    37.510k i/s
      String#split B    12.421k i/s
      String#split C     3.377k i/s
      String#split D     1.510k i/s
Calculating -------------------------------------
          Nokogiri A    21.406k i/s -     70.462k times in 3.291765s (46.72μs/i)
          Nokogiri B    17.742k i/s -     56.428k times in 3.180455s (56.36μs/i)
          Nokogiri C    11.267k i/s -     35.007k times in 3.107085s (88.76μs/i)
          Nokogiri D     7.031k i/s -     23.231k times in 3.304298s (142.24μs/i)
               Oga A    54.490k i/s -    173.911k times in 3.191630s (18.35μs/i)
               Oga B    55.149k i/s -    176.357k times in 3.197855s (18.13μs/i)
               Oga C    54.794k i/s -    176.316k times in 3.217820s (18.25μs/i)
               Oga D    55.835k i/s -    181.653k times in 3.253405s (17.91μs/i)
             REXML A    19.160k i/s -     58.660k times in 3.061507s (52.19μs/i)
             REXML B    19.508k i/s -     58.308k times in 2.988858s (51.26μs/i)
             REXML C    18.517k i/s -     59.202k times in 3.197114s (54.00μs/i)
             REXML D    18.080k i/s -     58.172k times in 3.217389s (55.31μs/i)
       String#gsub A    50.041k i/s -    155.831k times in 3.114064s (19.98μs/i)
       String#gsub B    28.016k i/s -     90.211k times in 3.220032s (35.69μs/i)
       String#gsub C    12.342k i/s -     37.053k times in 3.002231s (81.03μs/i)
       String#gsub D     6.999k i/s -     20.714k times in 2.959439s (142.87μs/i)
      String#split A    36.066k i/s -    112.530k times in 3.120133s (27.73μs/i)
      String#split B    12.002k i/s -     37.262k times in 3.104546s (83.32μs/i)
      String#split C     3.243k i/s -     10.130k times in 3.123976s (308.39μs/i)
      String#split D     1.807k i/s -      4.529k times in 2.506557s (553.45μs/i)

               Oga D:     55834.7 i/s
               Oga B:     55148.5 i/s - 1.01x  slower
               Oga C:     54793.6 i/s - 1.02x  slower
               Oga A:     54489.7 i/s - 1.02x  slower
       String#gsub A:     50041.0 i/s - 1.12x  slower
      String#split A:     36065.8 i/s - 1.55x  slower
       String#gsub B:     28015.6 i/s - 1.99x  slower
          Nokogiri A:     21405.5 i/s - 2.61x  slower
             REXML B:     19508.5 i/s - 2.86x  slower
             REXML A:     19160.5 i/s - 2.91x  slower
             REXML C:     18517.3 i/s - 3.02x  slower
             REXML D:     18080.5 i/s - 3.09x  slower
          Nokogiri B:     17742.1 i/s - 3.15x  slower
       String#gsub C:     12341.8 i/s - 4.52x  slower
      String#split B:     12002.4 i/s - 4.65x  slower
          Nokogiri C:     11266.8 i/s - 4.96x  slower
          Nokogiri D:      7030.5 i/s - 7.94x  slower
       String#gsub D:      6999.3 i/s - 7.98x  slower
      String#split C:      3242.7 i/s - 17.22x  slower
      String#split D:      1806.9 i/s - 30.90x  slower

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Help us understand the problem. What are the problem?