More than 5 years have passed since last update.

Rでハッシュテーブルをつかう

Last updated at 2014-10-01Posted at 2014-10-01

Rのlistは名前でのlookupもできるが、リストなのでO(N)の計算量がかかって効率が悪い。

CRANにあるhashパッケージをインストールすれば本当のハッシュテーブが使える。

install

install.packages("hash")
library(hash)

現時点での最新版(hash-2.2.6) は R ≥ 2.12.0 を必要とするので、古いバージョンのRで使いたければ、ソースをダウンロードして以下のようにするか、

install.packages("path/to/source", repos=NULL, type="source")

バージョンを指定してインストールすればよい

install.packages("hash", version="2.0.2")

sample

sample.R

h <- hash()
bh["a"] <- 123
h["b"] <- "str"
h["c"] <- list(a=1, b=2)

print(h)
# <hash> containing 3 key-value pair(s).
#   a : 123
#   b : str
#   c : 1 2

print(h[["b"]])   # => str


h["c"] <- NULL    # remove the entry
has.key("x", h)   # check whether the key exists

h2 <- h           # pass the reference
h2["y"] <- 456
print(h["y"])     # => 456

print(class(keys(h)))    # => character
print(class(values(h)))  # => list

補足

keyに使えるのは文字列のみ
通常のRのオブジェクトと違って、hashはpass by referenceのような挙動になる。つまりhashを別の名前の変数に代入してもコピーはされず参照先は同一のオブジェクトになる。

簡易速度比較

以下のスクリプトを実行してみる

hash.R

library(hash)

dat <- hash()
for (i in 1:10000) {
    dat[as.character(i)] <- i
}

sum <- 0
for (i in 1:10000) {
    sum <- sum + dat[[as.character(i)]]
}

print(sum)

list.R

dat <- list()
for (i in 1:10000) {
    dat[as.character(i)] <- i
}

sum <- 0
for (i in 1:10000) {
    sum <- sum + dat[[as.character(i)]]
}

print(sum)

結果

$ time R --no-save < list.R > /dev/null
R --no-save < list.R > /dev/null  9.70s user 0.07s system 100% cpu 9.772 total
$ time R --no-save < hash.R > /dev/null
hash-2.1.0 provided by Decision Patterns

R --no-save < hash.R > /dev/null  1.85s user 0.03s system 100% cpu 1.880 total

期待通り速くなった。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up