LoginSignup
32
27

More than 5 years have passed since last update.

OpenBLAS で R を高速化

Posted at

0. はじめに

BLAS は R が使用している行列計算用ライブラリです。これを OpenBLAS という CPU に特化したライブラリに入れ替えると R の計算速度向上が期待できます。
ここでは、R のデフォルト BLAS を OpenBLAS に入れ替える方法と、速度向上したかを調べてみました。
EC2 上の Amazon Linux で実行しています。CentOS でも同様だと思う。

$ cat /etc/system-release
$ cat /proc/cpuinfo | head
結果
Amazon Linux AMI release 2014.09
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping        : 4
microcode       : 0x416
cpu MHz         : 2500.066
cache size      : 25600 KB
physical id     : 0

0-1. 参考

1. 現在の BLAS の確認

まず、ターミナルを開いて R を実行。
別のターミナルで次を実行してプロセス ID を取得。

$ ps aux | grep exec/R
結果
hoxom 16466  0.0  0.8 245724 33832 pts/0    S+   15:09   0:00 /usr/lib64/R/bin/exec/R

プロセス ID が 16466 と分かったので、次を実行して BLAS を確認。

$ lsof -p 16466 | grep 'blas\|lapack'
結果
R       16466 hoxom  mem    REG  202,1    130856 411738 /usr/lib64/atlas/libcblas.so.3.0
R       16466 hoxom  mem    REG  202,1    120224 411742 /usr/lib64/atlas/libf77blas.so.3.0
R       16466 hoxom  mem    REG  202,1   5301912 411744 /usr/lib64/atlas/liblapack.so.3.0
R       16466 hoxom  mem    REG  202,1    357800 411434 /usr/lib64/libblas.so.3.5.0

2. ベンチマークの実行

ベンチマークスクリプトのダウンロード

$ curl http://r.research.att.com/benchmarks/R-benchmark-25.R -O

ベンチマークの実行

$ cat R-benchmark-25.R | time R --slave
結果
   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  1.133 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.74 
Sorting of 7,000,000 random values__________________ (sec):  0.861000000000001 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  12.639 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  5.18433333333333 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  1.71649282684099 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.461333333333333 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.985666666666669 
Determinant of a 2500x2500 random matrix____________ (sec):  2.21833333333333 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  2.00233333333333 
Inverse of a 1600x1600 random matrix________________ (sec):  1.81900000000001 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  1.53120397795481 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.704666666666659 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.347333333333343 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.01500000000001 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.442666666666668 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.557999999999993 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.558339366108916 


Total time for all 15 tests_________________________ (sec):  31.1116666666667 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  1.13638222045501 
                      --- End of test ---

162.20user 1.82system 2:44.03elapsed 99%CPU (0avgtext+0avgdata 1588288maxresident)k
0inputs+0outputs (0major+1025049minor)pagefaults 0swaps

3. OpenBLAS のインストール

$ git clone git://github.com/xianyi/OpenBLAS
$ cd OpenBLAS
$ make
$ sudo make install #デフォルトだと /opt/OpenBLAS/lib に
# インストール先を指定するには
# sudo make install PREFIX=/path/to/your/installation
結果
make -j 2 -f Makefile.install install
make[1]: Entering directory `/home/hoxom/OpenBLAS'
Generating openblas_config.h in /opt/OpenBLAS/include
Generating f77blas.h in /opt/OpenBLAS/include
Generating cblas.h in /opt/OpenBLAS/include
Copying LAPACKE header files to /opt/OpenBLAS/include
Copying the static library to /opt/OpenBLAS/lib
Copying the shared library to /opt/OpenBLAS/lib
Generating OpenBLASConfig.cmake in /opt/OpenBLAS/lib/cmake/openblas
Install OK!
make[1]: Leaving directory `/home/hoxom/OpenBLAS'

3-1. 参考

4. BLAS ライブラリの切り替え

alternatives に登録して簡単に切り替わるように設定します。

$ sudo alternatives --install /usr/lib64/libblas.so.3 libblas.so.3 /usr/lib64/libblas.so.3.5.0 10
$ sudo alternatives --install /usr/lib64/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so 40

update-alternatives コマンドで簡単に切り替え可能です。

$ sudo update-alternatives --config libblas.so.3
結果
There are 2 programs which provide 'libblas.so.3'.

  Selection    Command
-----------------------------------------------
   1           /usr/lib64/libblas.so.3.5.0
*+ 2           /opt/OpenBLAS/lib/libopenblas.so

Enter to keep the current selection[+], or type selection number: 

4-1. 参考

5. 切り替わったことの確認

R を再起動してプロセス ID を取得し、確認します。

$ lsof -p 3531 | grep 'blas\|lapack'
結果
R       3531 hoxom  mem    REG  202,1    130856 411738 /usr/lib64/atlas/libcblas.so.3.0
R       3531 hoxom  mem    REG  202,1    120224 411742 /usr/lib64/atlas/libf77blas.so.3.0
R       3531 hoxom  mem    REG  202,1   5301912 411744 /usr/lib64/atlas/liblapack.so.3.0
R       3531 hoxom  mem    REG  202,1  12589474 439901 /opt/OpenBLAS/lib/libopenblas_sandybridgep-r0.2.13.so

6. ベンチマーク

$ cat R-benchmark-25.R | time R --slave
結果
   R Benchmark 2.5
   ===============
Number of times each test is run__________________________:  3

   I. Matrix calculation
   ---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec):  1.128 
2400x2400 normal distributed random matrix ^1000____ (sec):  0.738666666666667 
Sorting of 7,000,000 random values__________________ (sec):  0.859666666666666 
2800x2800 cross-product matrix (b = a' * a)_________ (sec):  0.621333333333335 
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec):  0.385333333333332 
                      --------------------------------------------
                 Trimmed geom. mean (2 extremes eliminated):  0.733445329451335 

   II. Matrix functions
   --------------------
FFT over 2,400,000 random values____________________ (sec):  0.459000000000001 
Eigenvalues of a 640x640 random matrix______________ (sec):  0.538000000000001 
Determinant of a 2500x2500 random matrix____________ (sec):  0.363666666666667 
Cholesky decomposition of a 3000x3000 matrix________ (sec):  0.382 
Inverse of a 1600x1600 random matrix________________ (sec):  0.308000000000005 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.399508951591998 

   III. Programmation
   ------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec):  0.699999999999998 
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec):  0.343666666666669 
Grand common divisors of 400,000 pairs (recursion)__ (sec):  1.006 
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec):  0.435333333333335 
Escoufier's method on a 45x45 matrix (mixed)________ (sec):  0.417999999999999 
                      --------------------------------------------
                Trimmed geom. mean (2 extremes eliminated):  0.503151472487133 


Total time for all 15 tests_________________________ (sec):  8.68666666666667 
Overall mean (sum of I, II and III trimmed means/3)_ (sec):  0.528280203797421 
                      --- End of test ---

57.44user 6.22system 0:52.52elapsed 121%CPU (0avgtext+0avgdata 1625872maxresident)k
0inputs+0outputs (0major+1027688minor)pagefaults 0swaps

7. 結果

ベンチマーク的には 2.8 倍の実行速度向上という結果になりました。

lapack とかは入れ替えないで良いのでしょうか? 教えてえらい人。

8. 参考文献

第6章 和田計也「Rエンジニアがおさえておきたいインフラの話」(p.133-) を参考にしました。

32
27
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
32
27