1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

手持ちのPCでLINPACKを走らせてみる

Last updated at Posted at 2019-07-13

どこかにバイナリ落ちてないかな、と思ったら
【基本】CPUやGPUの理論値FLOPSの計算方法と測定方法
という記事を発見。

  • 実際に自分のパソコン(Windows)でLINPACKを測ってみた。

という副見出しには、インテルがXEON向けに提供しているのがあると書いてある。
Intel® LINPACK Benchmark Download – License Agreement

なるほど。フルチューンドのバイナリが存在するわけね。実行してみようと思ったけど微妙に使い方が分からないのでLinux版を落としてきてUbuntuで動かしてみる。
実際には、片手間に走らせられるような甘いモノではなかったと後で後悔する。

Acer Nitro5 Corei5 8300H

Sample data file lininput_xeon64.

Current date/time: Tue Jul 16 17:36:04 2019

CPU frequency: 3.989 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Parameters are set to:

Number of tests: 15

Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.007 92.4596 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.005 125.7945 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.005 128.9757 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.005 129.1973 1.502479e-12 5.123838e-02 pass
2000 2000 4 0.040 132.6302 5.141276e-12 4.472280e-02 pass
2000 2000 4 0.041 129.8964 5.141276e-12 4.472280e-02 pass
5000 5008 4 0.536 155.5857 2.567213e-11 3.579772e-02 pass
5000 5008 4 0.545 152.9813 2.567213e-11 3.579772e-02 pass
10000 10000 4 4.316 154.5192 1.034086e-10 3.646293e-02 pass
10000 10000 4 4.213 158.2908 1.034086e-10 3.646293e-02 pass
15000 15000 4 14.515 155.0456 2.206862e-10 3.475844e-02 pass
15000 15000 4 15.133 148.7150 2.206862e-10 3.475844e-02 pass
18000 18008 4 24.324 159.8681 2.839891e-10 3.110030e-02 pass
18000 18008 4 24.294 160.0637 2.839891e-10 3.110030e-02 pass
20000 20016 4 33.223 160.5565 3.839458e-10 3.398761e-02 pass
20000 20016 4 33.210 160.6187 3.839458e-10 3.398761e-02 pass
22000 22008 4 44.239 160.4836 4.191068e-10 3.069793e-02 pass
22000 22008 4 52.114 136.2331 4.191068e-10 3.069793e-02 pass
25000 25000 4 84.023 123.9891 5.526876e-10 3.142937e-02 pass
25000 25000 4 77.841 133.8358 5.526876e-10 3.142937e-02 pass
26000 26000 4 93.571 125.2384 6.643859e-10 3.493541e-02 pass
26000 26000 4 72.181 162.3513 6.643859e-10 3.493541e-02 pass
27000 27000 4 80.921 162.1760 6.191166e-10 3.019127e-02 pass
30000 30000 1 110.933 162.2761 8.044883e-10 3.171301e-02 pass
35000 35000 1 176.163 162.2688 1.155255e-09 3.353531e-02 pass
40000 40000 1 260.541 163.7740 1.430789e-09 3.182123e-02 pass
45000 45000 1 14033.765 4.3291 1.628164e-09 2.864583e-02 pass

Performance Summary (GFlops)

Size LDA Align. Average Maximal
1000 1000 4 119.1068 129.1973
2000 2000 4 131.2633 132.6302
5000 5008 4 154.2835 155.5857
10000 10000 4 156.4050 158.2908
15000 15000 4 151.8803 155.0456
18000 18008 4 159.9659 160.0637
20000 20016 4 160.5876 160.6187
22000 22008 4 148.3584 160.4836
25000 25000 4 128.9125 133.8358
26000 26000 4 143.7948 162.3513
27000 27000 4 162.1760 162.1760
30000 30000 1 162.2761 162.2761
35000 35000 1 162.2688 162.2688
40000 40000 1 163.7740 163.7740
45000 45000 1 4.3291 4.3291

Residual checks PASSED
End of tests 2019/07/16 22:19


  • UBUNTU Linux on Windows 10

$ ./runme_xeon64
This is a SAMPLE run script for running a shared-memory version of
Intel(R) Distribution for LINPACK* Benchmark. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
*Other names and brands may be claimed as the property of others.
./runme_xeon64: 28: [: -gt: unexpected operator
Fri Jul 12 22:10:10 DST 2019
Sample data file lininput_xeon64.

Current date/time: Fri Jul 12 22:10:10 2019

CPU frequency: 3.989 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Parameters are set to:

Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1

Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.009 Sec 77.4449 GFlops 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.007 Sec 99.1057 GFlops 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.007 Sec 97.7584 GFlops 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.005 Sec 125.8793 GFlops 1.502479e-12 5.123838e-02 pass
2000 2000 4 0.061 Sec 87.2840 GFlops 5.141276e-12 4.472280e-02 pass
2000 2000 4 0.049 Sec 109.8297 GFlops 5.141276e-12 4.472280e-02 pass
5000 5008 4 0.662 Sec 125.9371 GFlops 2.567213e-11 3.579772e-02 pass
5000 5008 4 0.804 Sec 103.7036 GFlops 2.567213e-11 3.579772e-02 pass
10000 10000 4 5.038 Sec 132.3704 GFlops 1.034086e-10 3.646293e-02 pass
10000 10000 4 5.161 Sec 129.2063 GFlops 1.034086e-10 3.646293e-02 pass
15000 15000 4 21.230 Sec 106.0046 GFlops 2.206862e-10 3.475844e-02 pass
15000 15000 4 20.339 Sec 110.6467 GFlops 2.206862e-10 3.475844e-02 pass
18000 18008 4 39.051 Sec 99.5791 GFlops 2.839891e-10 3.110030e-02 pass
18000 18008 4 31.372 Sec 123.9542 GFlops 2.839891e-10 3.110030e-02 pass
20000 20016 4 48.966 Sec 108.9351 GFlops 3.839458e-10 3.398761e-02 pass
20000 20016 4 47.882 Sec 111.4016 GFlops 3.839458e-10 3.398761e-02 pass
22000 22008 4 66.404 Sec 106.9153 GFlops 4.191068e-10 3.069793e-02 pass
22000 22008 4 60.647 Sec 117.0651 GFlops 4.191068e-10 3.069793e-02 pass
25000 25000 4 93.090 Sec 111.9123 GFlops 5.526876e-10 3.142937e-02 pass
25000 25000 4 84.308 Sec 123.5697 GFlops 5.526876e-10 3.142937e-02 pass
26000 26000 4 92.848 Sec 126.2134 GFlops 6.643859e-10 3.493541e-02 pass
26000 26000 4 93.009 Sec 125.9950 GFlops 6.643859e-10 3.493541e-02 pass
27000 27000 4 114.919 Sec 114.1970 GFlops 6.191166e-10 3.019127e-02 pass
30000 30000 1 153.655 Sec 117.1575 GFlops 8.044883e-10 3.171301e-02 pass
35000 35000 1 244.501 Sec 116.9146 GFlops 1.199457e-09 3.481843e-02 pass
40000 40000 1 1218.164 Sec 35.0280 GFlops 1.430789e-09 3.182123e-02 pass

最後の方はPCそもものがあまりに重くなって、マトモに動かせる状況ではなくなってしまった。
アレイサイズ45000のテスト途中で継続を断念。
コンスタントに100GFlops程度の性能は出ているようだが、大きな配列ではメモリに入りきらずに盛大にスワップアウトが起きて半分以下に性能が低下している。
おそらく、最後のテストはもっと酷い結果になっていた可能性が高い。

なお、インテル製だから文句を言う筋合いは無いのかもしれないが、このベンチマークプログラムはXeonでなくても動かせるが、AMDのプロセッサでは動かせないようだ、残念。
Ryzenと対決させたらどんだけ速いのか知りたかったのに。

仕方ないので、手持ちのインテルプロセッサを試してみる。

Sandy Bridge Corei7 2600K

Sample data file lininput_xeon64.
Current date/time: Sat Jul 13 18:04:32 2019
CPU frequency: 3.491 GHz
Number of CPUs: 1 Number of cores: 4 Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=7200601024, at the size=30000
=================== Timing linear equation system solver ===================

Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.046 14.6478 1.070422e-12 3.650412e-02 pass
1000 1000 4 0.011 58.8439 1.070422e-12 3.650412e-02 pass
1000 1000 4 0.016 42.0040 1.070422e-12 3.650412e-02 pass
1000 1000 4 0.012 57.2930 1.070422e-12 3.650412e-02 pass
2000 2000 4 0.333 16.0505 4.491907e-12 3.907409e-02 pass
2000 2000 4 0.101 52.8872 4.491907e-12 3.907409e-02 pass
5000 5008 4 1.440 57.8912 2.066952e-11 2.882198e-02 pass
5000 5008 4 1.212 68.7985 2.066952e-11 2.882198e-02 pass
10000 10000 4 10.566 63.1154 9.739365e-11 3.434199e-02 pass
10000 10000 4 14.245 46.8129 9.739365e-11 3.434199e-02 pass
15000 15000 4 35.600 63.2145 2.097041e-10 3.302874e-02 pass
15000 15000 4 39.833 56.4966 2.097041e-10 3.302874e-02 pass
18000 18008 4 55.005 70.6958 3.084368e-10 3.377762e-02 pass
18000 18008 4 46.225 84.1240 3.084368e-10 3.377762e-02 pass
20000 20016 4 60.971 87.4865 3.810194e-10 3.372857e-02 pass
20000 20016 4 69.257 77.0193 3.810194e-10 3.372857e-02 pass
22000 22008 4 81.899 86.6872 4.221506e-10 3.092088e-02 pass
22000 22008 4 87.022 81.5843 4.221506e-10 3.092088e-02 pass
25000 25000 4 149.883 69.5071 6.010080e-10 3.417717e-02 pass
25000 25000 4 152.261 68.4214 6.010080e-10 3.417717e-02 pass
26000 26000 4 171.803 68.2100 6.133844e-10 3.225359e-02 pass
26000 26000 4 161.636 72.5006 6.133844e-10 3.225359e-02 pass
27000 27000 4 171.397 76.5678 6.508096e-10 3.173678e-02 pass
30000 30000 1 246.497 73.0304 7.763313e-10 3.060306e-02 pass

Performance Summary (GFlops)

Size LDA Align. Average Maximal
1000 1000 4 43.1972 58.8439
2000 2000 4 34.4689 52.8872
5000 5008 4 63.3448 68.7985
10000 10000 4 54.9642 63.1154
15000 15000 4 59.8556 63.2145
18000 18008 4 77.4099 84.1240
20000 20016 4 82.2529 87.4865
22000 22008 4 84.1358 86.6872
25000 25000 4 68.9642 69.5071
26000 26000 4 70.3553 72.5006
27000 27000 4 76.5678 76.5678
30000 30000 1 73.0304 73.0304
Residual checks PASSED
End of tests
2019/07/13 18:38
この世代だと100GFlopsは難しいか。

Critea VF-AG Corei7 4700MQ

Sample data file lininput_xeon64.
Current date/time: Mon Jul 15 23:17:23 2019
CPU frequency: 3.172 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.008 85.0169 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.007 95.6646 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.007 93.6088 1.502479e-12 5.123838e-02 pass
1000 1000 4 0.007 95.8483 1.502479e-12 5.123838e-02 pass
2000 2000 4 0.051 103.8164 5.141276e-12 4.472280e-02 pass
2000 2000 4 0.051 104.1368 5.141276e-12 4.472280e-02 pass
5000 5008 4 0.750 111.1409 2.567213e-11 3.579772e-02 pass
5000 5008 4 0.731 113.9918 2.567213e-11 3.579772e-02 pass
10000 10000 4 6.127 108.8381 1.034086e-10 3.646293e-02 pass
10000 10000 4 6.723 99.1960 1.034086e-10 3.646293e-02 pass
15000 15000 4 28.406 79.2234 2.206862e-10 3.475844e-02 pass
15000 15000 4 31.787 70.7968 2.206862e-10 3.475844e-02 pass
18000 18008 4 73.501 52.9061 2.839891e-10 3.110030e-02 pass
18000 18008 4 70.655 55.0374 2.839891e-10 3.110030e-02 pass
20000 20016 4 94.526 56.4303 3.839458e-10 3.398761e-02 pass
20000 20016 4 90.065 59.2251 3.839458e-10 3.398761e-02 pass
22000 22008 4 117.070 60.6444 4.191068e-10 3.069793e-02 pass
22000 22008 4 151.761 46.7817 4.191068e-10 3.069793e-02 pass
25000 25000 4 173.265 60.1270 5.526876e-10 3.142937e-02 pass
25000 25000 4 289.820 35.9461 5.526876e-10 3.142937e-02 pass
26000 26000 4 225.227 52.0305 6.643859e-10 3.493541e-02 pass
26000 26000 4 232.029 50.5052 6.643859e-10 3.493541e-02 pass
27000 27000 4 251.399 52.2017 6.191166e-10 3.019127e-02 pass
30000 30000 1 298.730 60.2612 8.044883e-10 3.171301e-02 pass
35000 35000 1 465.466 61.4133 1.155255e-09 3.353531e-02 pass
40000 40000 1 696.626 61.2522 1.430789e-09 3.182123e-02 pass
45000 45000 1 3433.360 17.6952 1.628164e-09 2.864583e-02 pass

Performance Summary (GFlops)

Size LDA Align. Average Maximal
1000 1000 4 92.5347 95.8483
2000 2000 4 103.9766 104.1368
5000 5008 4 112.5664 113.9918
10000 10000 4 104.0170 108.8381
15000 15000 4 75.0101 79.2234
18000 18008 4 53.9718 55.0374
20000 20016 4 57.8277 59.2251
22000 22008 4 53.7131 60.6444
25000 25000 4 48.0366 60.1270
26000 26000 4 51.2679 52.0305
27000 27000 4 52.2017 52.2017
30000 30000 1 60.2612 60.2612
35000 35000 1 61.4133 61.4133
40000 40000 1 61.2522 61.2522
45000 45000 1 17.6952 17.6952

Residual checks PASSED
End of tests
2019/07/16 01:44
意外と速いのがHaswell。
配列サイズが小さいうちは100GFlopsも普通に叩き出すが、大きな配列になると、どんどん性能が落ちるようだ。
10秒以上計算に時間がかかるようになると性能低下が目立つ。

1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?