0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

いまさらラズパイ4でHPLのメモ2023年

Last updated at Posted at 2023-11-05

1.始めに

Raspberry pi4の8GB版が3台手元に来たので、たまにはクラスタでもと組んでみました。何を動かすか、というところで、定番のHPLを動かそうとしたところ、ネットの情報が新旧混合で翻弄されました。最終的に正解だけを選んでやると、非常にシンプルにすぐHPLができることがわかりました。Raspberry pi3の頃は、atlasを自分でコンパイルしたりと、セットアップに一日がかりだったのですが、raspberry pi4でOSも64bitになり、ライブラリもきちっとそろっているので、apt一発で環境が整うようになりました。自分へのメモを兼ねて残しておきます。

2.前提

  • Raspberry Pi 4 8GB 3台
      ただし、電源は20Wくらいのものを使用してください。あと冷却ファンは必須。
  • micro SDカード3枚
      HPLにストレージ性能は必要ないので、あまり神経質にならなくてよいです。
      ubuntu server 22.04.3 LTS 64bit版をインストールしておきます。
  • Ethernet
      一応安定化のため、有線で接続します。

3.セットアップコマンド一式

以下のコマンド一式でHPLの単独動作確認までできます。
vi HPL.datでは、以下の内容を記載してください。

$ sudo apt update
$ sudo apt upgrade -y
$ sudo apt install -y make gcc libopenmpi-dev libopenblas-dev
$ wget https://www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
$ tar zxvf hpl-2.3.tar.gz
$ cd hpl-2.3/
$ ./configure
$ make
$ cd testing/
$ vi HPL.dat
$ ./xhpl
HPL.dat
PLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
10000        Ns
1            # of NBs
128          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

結果です。

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :     128
PMAP   : Row-major process mapping
P      :       1
Q      :       1
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       10000   128     1     1              46.42             1.4365e+01
HPL_pdgesv() start time Sun Nov  5 13:10:11 2023

HPL_pdgesv() end time   Sun Nov  5 13:10:57 2023

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   5.53092509e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

この例では、14.3GFlops出ています。

4. 3台でクラスタ

RasPiにパスワードなしでssh接続

この記事にそって、プライマリ側のキーをセカンダリに登録してください。できたらプライマリからセカンダリへのssh接続がパスワードなしで行われていることを確認してください。
次にホストを登録するmachinefileを作成します

<プライマリのipアドレス>
<セカンダリ1のipアドレス>
<セカンダリ2のipアドレス>

192.168.0.2
192.168.0.3
192.168.0.4

できたら、プライマリのhpl-2.3/testing/に行って、3台用のHPL.datを作成してください。

HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
50000         Ns
1            # of NBs
128          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
3            Ps
4            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

で、以下のコマンドで実行してください。

$ mpirun --np 12 --hostfile machinefile ./xhpl

3台でxhplを実行し始めます。少々時間かかります。結果は以下のようになります。

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   50000
NB     :     128
PMAP   : Row-major process mapping
P      :       3
Q      :       4
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       50000   128     3     4            2128.47             3.9153e+01
HPL_pdgesv() start time Sun Nov  5 08:49:39 2023

HPL_pdgesv() end time   Sun Nov  5 09:25:08 2023

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.30604606e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

39.1GFlops出ました。

最後に

手元のシングルボード組み合わせて40GFlops出る世の中なのですね。Raspberry Pi 5も期待できます。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?