More than 5 years have passed since last update.

strptime gemで高速に文字列をTimeに変換する

Ruby

Last updated at 2015-09-30Posted at 2015-09-30

背景

Rubyで文字列の時間表現を時間に変換するにはTime.strptimeをよく使います．以下は公式リファレンスの例です．

Time.strptime('2001-02-03T04:05:06+09:00', '%Y-%m-%dT%H:%M:%S%z')
# => 2001-02-03 06:05:06 +0900

これは楽でとても便利なんですが，Time.strptimeには遅いという致命的な問題があります．例えばFluentdのようなひたすらログを読み込んでパースするソフトウェアの場合，Time.strptimeそのものがボトルネックになります．これはTime.strptimeは毎回文字列フォーマットをパースしないといけないなど，いくつかの要因があります．

Fluentdでこの問題をどう解決していたかと言うと，文字列をキャッシュして，前と文字列が同じであればTime.strptimeをskipするという方法をとってます(該当コード)．

今まではこれで結構うまく行ってたんですが，ミリ秒やナノ秒レベルの時間のパースをちゃんとサポートしようとすると，キャッシュが効かないという問題が出てきました(例えば「Support nanosecond」の機能とか)．で困ったなぁということで，目の前に座っているRubyコミッタの方に相談したところ，strptimeというそのものずばりなgemを作ってくれました．

使い方

インストールはgem install strptimeするだけです．コードはREADMEにも例が載ってますが，以下のようになります．Strptimeクラスをパースしたいフォーマットで生成し，あとでexecメソッドでパースしたい文字列を渡すだけです．結果はTimeオブジェクトで返ってくるので，あとは煮るなり焼くなり好きにするだけです．

require 'strptime'

parser = Strptime.new('%Y-%m-%dT%H:%M:%S%z')
parser.exec('2015-12-25T12:34:56+09') #=> 2015-12-25 12:34:56 +09:00
parser.execi('2015-12-25T12:34:56+09') #=> 1451014496

Strptimeはあらかじめフォーマットをパースして専用の命令セットを構築し，変換する時にはその命令セットをなぞるだけになっています．なのでTime.strptimeで行われるような毎回のフォーマットのパースをスキップでき，その分はるかに高速です(その他いくつかの最適化とかが入ってます)．

ベンチマーク

Strptimeを使えば，キャッシュがなくてもかなり高速に動くようになります．以下が簡単なベンチマークスクリプトと手元のMBPでの結果になりますが(Time.strptimeはキャッシュ機能付き)，Strptimeバージョンでも十分高速に処理出来てることがわかります．360000件のパースに0.4秒しか掛かってないので，Fluentdでも十分利用出来ます．

結果

                           user     system      total        real
sec:Time.strptime      0.110000   0.000000   0.110000 (  0.110357)
sec:Strptime           0.360000   0.010000   0.370000 (  0.360605)
msec:Time.strptime     5.580000   0.080000   5.660000 (  5.707343)
msec:Strptime          0.360000   0.000000   0.360000 (  0.358431)

スクリプト

require 'benchmark'
require 'time'
require 'strptime'

class ParserError < StandardError
end

# Copied from Fluentd's TimeParser
class TimeParserWithTimeFormat
  def initialize(time_format)
    @cache1_key = nil
    @cache1_time = nil
    @cache2_key = nil
    @cache2_time = nil
    @parser = Proc.new { |value| Time.strptime(value, time_format) }
  end

  def parse(value)
    if @cache1_key == value
      return @cache1_time
    elsif @cache2_key == value
      return @cache2_time
    else
      begin
        time = @parser.call(value).to_i
      rescue => e
        raise ParserError, "invalid time format: value = #{value}, error_class = #{e.class.name}, error = #{e.message}"
      end
      @cache1_key = @cache2_key
      @cache1_time = @cache2_time
      @cache2_key = value
      @cache2_time = time
      return time
    end
  end
end

class TimeParserWithStrptime
  def initialize(time_format)
    @strptime = Strptime.new(time_format)
    @parser = @strptime.method(:exec)
  end

  def parse(value)
    begin
      return @parser.call(value).to_i
    rescue => e
      raise ParserError, "invalid time format: value = #{value}, error_class = #{e.class.name}, error = #{e.message}"
    end
  end
end

sec_times = []
60.times { |i|
  60.times { |j|
    100.times {
      sec_times << "28/Feb/2015:10:%02d:%02d +0900" % [i, j]
    }
  }
}

msec_times = []
60.times { |i|
  60.times { |j|
    100.times { |k|
      msec_times << "28/Feb/2015:10:%02d:%02d.%03d +0900" % [i, j, k]
    }
  }
}

sec_time_format = "%d/%b/%Y:%H:%M:%S %z"
msec_time_format = "%d/%b/%Y:%H:%M:%S.%N %z"

Benchmark.bm(20) do |x|
  x.report('sec:Time.strptime') {
    parser = TimeParserWithTimeFormat.new(sec_time_format)
    sec_times.each { |t|
      parser.parse(t)
    }
  }
  x.report('sec:Strptime') {
    parser = TimeParserWithStrptime.new(sec_time_format)
    sec_times.each { |t|
      parser.parse(t)
    }
  }
  x.report('msec:Time.strptime') {
    parser = TimeParserWithTimeFormat.new(msec_time_format)
    msec_times.each { |t|
      parser.parse(t)
    }
  }
  x.report('msec:Strptime') {
    parser = TimeParserWithStrptime.new(msec_time_format)
    msec_times.each { |t|
      parser.parse(t)
    }
  }
end

まとめ

ということで，もしTime.strptimeを結構な頻度で呼び出すアプリケーションを書いてる人がいれば，strptime gemを使うと，パフォーマンスが改善すると思います．まぁTime.strptimeがボトルネックになるようなRubyアプリケーションがそんなにあるとは思いませんが…

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up