More than 5 years have passed since last update.

Hash のキーに配列を利用するアプローチ

Ruby

Last updated at 2015-06-13Posted at 2015-06-13

以前[『Ruby 基本のき: オブジェクト周りを遊びながら理解する』(5.4節)] (http://qiita.com/yut_h1979/items/aafece7404a1002c0105#54-hash-%E3%81%AE%E3%82%AD%E3%83%BC%E3%81%AB-mutable-%E3%81%AA-string-%E3%82%AA%E3%83%96%E3%82%B8%E3%82%A7%E3%82%AF%E3%83%88%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E3%82%82%E5%A4%A7%E4%B8%88%E5%A4%AB%E3%81%AA%E3%81%AE) でこんなことを書きました。

Hash のキーに Array オブジェクトなどを使用した場合はこのようなコピー処置は行われません。
（そんなことをする人はあまりいないと思いますが・・・）

が、最近このアプローチの実装例を見つけて興味深かったので、
使いどころやメリデメなどを自分なりにまとめてみました。

実装例

スプレッドシートを扱う下記ライブラリにおいて、2次元セルの内部データとして
「行番号・列番号の配列をキーとした Hash オブジェクト」が使われています。

実装内容のエッセンスは以下になります。

class Cells
  def initialize
    @cells = {}
  end

  def [](row_index, col_index)
    @cells[[row_index, col_index]]
  end

  def []=(row_index, col_index, value)
    @cells[[row_index, col_index]] = value
  end
end

cells = Cells.new
cells[10, 20] = "foo"
cells[10, 20] # => "foo"

メリデメ

メリット：

飛び飛びのデータを効率良く扱える
- cells[10, 1000] = "foo"; cells[100, 10000] = "bar" としても2要素の追加で済む
小数点や文字列なども扱える
- cells[1.53, 2.01]
- cells["A", "Z"]

デメリット：

2次元配列利用のアプローチに比べてパフォーマンスが悪い（特にデータ量が多くなる場合）
- 詳細：他アプローチとのパフォーマンス比較（本記事）
配列キーが破壊的に変更されないよう留意する必要がある
- 詳細： [『Ruby 基本のき: オブジェクト周りを遊びながら理解する』(5.4節)（別記事）] (http://qiita.com/yut_h1979/items/aafece7404a1002c0105#54-hash-%E3%81%AE%E3%82%AD%E3%83%BC%E3%81%AB-mutable-%E3%81%AA-string-%E3%82%AA%E3%83%96%E3%82%B8%E3%82%A7%E3%82%AF%E3%83%88%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E3%82%82%E5%A4%A7%E4%B8%88%E5%A4%AB%E3%81%AA%E3%81%AE)

どんなデータ構造に使えそうか

2次元セルデータ
プロットデータ（n次元拡張も容易にできそう）
地図データ（経度・緯度）

他アプローチとのパフォーマンス比較

以下3パターンについて 1,000 x 1,000 セルの値格納・値参照の処理時間を計測しました。

① 配列キーのHashを利用するアプローチ
② カンマ区切り文字列キーのHashを利用するアプローチ
③ 2次元配列を利用するアプローチ

一番速かったのは「③ 2次元配列を利用するアプローチ」でした。
ちなみに標準ライブラリの Matrix, CSV::Table クラスなどはこのアプローチを使っているようです。

実装スクリプト：

module Cells

  # ① 配列キーのHashを利用するアプローチ（クラス名変えて再掲）
  class ByHashOfArrayKey
    def initialize
      @cells = {}
    end

    def [](row_index, col_index)
      @cells[[row_index, col_index]]
    end

    def []=(row_index, col_index, value)
      @cells[[row_index, col_index]] = value
    end
  end

  # ② カンマ区切り文字列キーのHashを利用するアプローチ
  class ByHashOfStringKey
    def initialize
      @cells = {}
    end

    def [](row_index, col_index)
      @cells["#{row_index},#{col_index}"]
    end

    def []=(row_index, col_index, value)
      @cells["#{row_index},#{col_index}"] = value
    end
  end

  # ③ 2次元配列を利用するアプローチ
  class By2DArray
    def initialize
      @cells = []
    end

    def [](row_index, col_index)
      (@cells[row_index] ||= [])[col_index]
    end

    def []=(row_index, col_index, value)
      (@cells[row_index] ||= [])[col_index] = value
    end
  end
end

測定スクリプト：

require 'benchmark'

def fill_up_cells(cells, row_size, col_size, value)
  (1..row_size).each do |r|
    (1..col_size).each do |c|
      cells[r, c] = value
    end
  end
end

def look_up_cells(cells, row_size, col_size)
  (1..row_size).each do |r|
    (1..col_size).each do |c|
      cells[r, c]
    end
  end
end

Benchmark.bm(25) do |x|
  cells_1 = Cells::ByHashOfArrayKey.new
  cells_2 = Cells::ByHashOfStringKey.new
  cells_3 = Cells::By2DArray.new

  [cells_1, cells_2, cells_3].each do |cells|
    x.report(cells.class.name) { fill_up_cells(cells, 1_000, 1_000, true) }
  end

  [cells_1, cells_2, cells_3].each do |cells|
    x.report(cells.class.name) { look_up_cells(cells, 1_000, 1_000) }
  end
end

測定結果：

OS X Yosemite 10.10.3
2.6 GHz Intel Core i5
8 GB 1600 MHz DDR3
ruby 2.2.2p95 (2015-04-13 revision 50295)

1,000x1,000セルの値格納

                                user     system      total        real
Cells::ByHashOfArrayKey     1.280000   0.030000   1.310000 (  1.312445)
Cells::ByHashOfStringKey    1.290000   0.050000   1.340000 (  1.346375)
Cells::By2DArray            0.130000   0.000000   0.130000 (  0.133934)

1,000x1,000セルの値参照

                                user     system      total        real
Cells::ByHashOfArrayKey     1.280000   0.010000   1.290000 (  1.292671)
Cells::ByHashOfStringKey    0.990000   0.030000   1.020000 (  1.029152)
Cells::By2DArray            0.110000   0.000000   0.110000 (  0.103042)

まとめ

「Hash のキーに配列を利用するアプローチ」の使いどころやメリデメなどを書きました。

google_drive や roo での活用例を知るまでこのアプローチは悪手だと思っていましたが、
パフォーマンスや配列キーの破壊的変更などに留意した上で使うのは悪くないかもと思いました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up