More than 5 years have passed since last update.

scipy.sparseは内積演算に最適化されていない

Posted at 2014-07-02

import timeit
import numpy as np
import scipy as sp

def getSparse(length, size, todense = False):
    array = np.random.random_integers(0, size - 1, length)
    response =  scipy.sparse.csr_matrix(([1]*len(array), array,range(len(array) + 1)), shape=(len(array),size), dtype = array.dtype)
    return response.todense() if todense else response

def testDense():
	x = np.dot(np.random.rand(300000).reshape(300, 1000), getSparse(1000,300, True))

def testSparse():
	x = np.dot(np.random.rand(300000).reshape(300, 1000), getSparse(1000,300, False))

print(timeit.timeit(testDense, setup = 'import __main__', number = 1))
# 0.08102297782897949
print(timeit.timeit(testSparse, setup = 'import __main__', number = 1))
# 30.572995901107788

スパース行列を使うことでdot演算が早くなることを期待したら、恐ろしく遅くなった。理論上は早くなってもおかしくないはずなわけだけど、実装が対応していないのではないかと思われる。

余談だが、実際にやりたいのは、ndarrayの特定の次元の値を、集約すること。たとえば、次元が(店舗×日付）になっているデータを、（地域×日付）にするというような処理。自分のユースケースでは、普通にdense行列で計算すれば問題なかったが、もっとデータが増えたとき、この特性は問題なりそう。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up