はじめに
どうもIntel製のCPUにMeltdownという不具合が見つかり、対策にあたって深刻なパフォーマンス問題を引き起こす可能性があるらしい。色々な人がベンチマークを行っていると思うが、自分でも試してみることにする。
ただしSandy Bridge世代のXeon E5-2660 (2.2GHz 8C16T) ×2、CentOS 6と古い環境なので注意。
LINPACKベンチマークについて
線形方程式を高速に解くベンチマークでCPUとメモリ帯域への負荷が高い。今回はIntel® Optimized LINPACK Benchmark for Linuxを利用する。使用したのは構築時に利用したlinpack_11.0.3
でベンチマークはrunme_xeon64
を実行するだけ。ただし実行する環境で実メモリが最低でも16GB必要。
比較結果
RedHatのサイトによると、Meltdownへの対策はkernel-2.6.32-696.18.7.el6.x86_64
で行われたらしい。この修正版は少なくともCentOS 6のrepoにはまだ降ってきていない(CentOS 7は存在する様子)。
とりあえず対策前のkernel-2.6.32-696.16.1.el6.x86_64
の手元環境でベンチマークを行ってみる。
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
2018年 1月 4日 木曜日 17:29:53 JST
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Thu Jan 4 17:29:53 2018
CPU frequency: 2.999 GHz
Number of CPUs: 2
Number of cores: 16
Number of threads: 32
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.023 28.4723 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.018 37.6328 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.017 38.9671 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.017 38.2876 8.724688e-13 2.975343e-02 pass
2000 2000 4 0.053 101.4005 4.701128e-12 4.089406e-02 pass
2000 2000 4 0.052 103.0739 4.701128e-12 4.089406e-02 pass
5000 5008 4 0.623 133.8619 2.434170e-11 3.394253e-02 pass
5000 5008 4 0.633 131.8303 2.434170e-11 3.394253e-02 pass
10000 10000 4 3.863 172.6226 8.916344e-11 3.143993e-02 pass
10000 10000 4 3.861 172.6983 8.916344e-11 3.143993e-02 pass
15000 15000 4 14.544 154.7288 2.165846e-10 3.411244e-02 pass
15000 15000 4 13.828 162.7475 2.165846e-10 3.411244e-02 pass
18000 18008 4 23.296 166.9203 2.945255e-10 3.225417e-02 pass
18000 18008 4 23.203 167.5927 2.945255e-10 3.225417e-02 pass
20000 20016 4 32.185 165.7339 3.831049e-10 3.391318e-02 pass
20000 20016 4 32.032 166.5251 3.831049e-10 3.391318e-02 pass
22000 22008 4 42.983 165.1728 4.066827e-10 2.978791e-02 pass
22000 22008 4 42.664 166.4062 4.066827e-10 2.978791e-02 pass
25000 25000 4 61.417 169.6253 5.501781e-10 3.128666e-02 pass
25000 25000 4 61.447 169.5433 5.501781e-10 3.128666e-02 pass
26000 26000 4 69.610 168.3475 5.851288e-10 3.076783e-02 pass
26000 26000 4 69.455 168.7243 5.851288e-10 3.076783e-02 pass
27000 27000 4 79.124 165.8588 6.532881e-10 3.185765e-02 pass
30000 30000 1 104.469 172.3171 7.329930e-10 2.889466e-02 pass
35000 35000 1 159.257 179.4945 1.115330e-09 3.237635e-02 pass
40000 40000 1 208.059 205.0856 1.359319e-09 3.023172e-02 pass
45000 45000 1 297.994 203.8768 1.876477e-09 3.301464e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 35.8399 38.9671
2000 2000 4 102.2372 103.0739
5000 5008 4 132.8461 133.8619
10000 10000 4 172.6604 172.6983
15000 15000 4 158.7381 162.7475
18000 18008 4 167.2565 167.5927
20000 20016 4 166.1295 166.5251
22000 22008 4 165.7895 166.4062
25000 25000 4 169.5843 169.6253
26000 26000 4 168.5359 168.7243
27000 27000 4 165.8588 165.8588
30000 30000 1 172.3171 172.3171
35000 35000 1 179.4945 179.4945
40000 40000 1 205.0856 205.0856
45000 45000 1 203.8768 203.8768
Residual checks PASSED
End of tests
Done: 2018年 1月 4日 木曜日 18:07:59 JST
1月5日の未明(UTCで 04-Jan-2018 19:42)にkernel-2.6.32-696.18.7.el6.x86_64
がupdatesに追加されたようなのでアップデートして再起動後に再度テスト。
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
2018年 1月 5日 金曜日 10:14:10 JST
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Fri Jan 5 10:14:10 2018
CPU frequency: 1.200 GHz
Number of CPUs: 2
Number of cores: 16
Number of threads: 32
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.031 21.5036 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.019 34.6105 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.019 34.7159 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.019 35.4197 8.724688e-13 2.975343e-02 pass
2000 2000 4 0.052 101.9037 4.701128e-12 4.089406e-02 pass
2000 2000 4 0.052 102.3420 4.701128e-12 4.089406e-02 pass
5000 5008 4 0.616 135.4189 2.434170e-11 3.394253e-02 pass
5000 5008 4 0.622 134.0885 2.434170e-11 3.394253e-02 pass
10000 10000 4 3.893 171.3056 8.916344e-11 3.143993e-02 pass
10000 10000 4 3.867 172.4651 8.916344e-11 3.143993e-02 pass
15000 15000 4 14.059 160.0744 2.165846e-10 3.411244e-02 pass
15000 15000 4 14.510 155.0924 2.165846e-10 3.411244e-02 pass
18000 18008 4 23.791 163.4498 2.945255e-10 3.225417e-02 pass
18000 18008 4 23.628 164.5806 2.945255e-10 3.225417e-02 pass
20000 20016 4 32.758 162.8351 3.831049e-10 3.391318e-02 pass
20000 20016 4 32.702 163.1117 3.831049e-10 3.391318e-02 pass
22000 22008 4 43.922 161.6420 4.066827e-10 2.978791e-02 pass
22000 22008 4 43.850 161.9055 4.066827e-10 2.978791e-02 pass
25000 25000 4 62.763 165.9878 5.501781e-10 3.128666e-02 pass
25000 25000 4 62.212 167.4592 5.501781e-10 3.128666e-02 pass
26000 26000 4 71.553 163.7752 5.851288e-10 3.076783e-02 pass
26000 26000 4 72.195 162.3197 5.851288e-10 3.076783e-02 pass
27000 27000 4 82.244 159.5668 6.532881e-10 3.185765e-02 pass
30000 30000 1 107.351 167.6913 7.329930e-10 2.889466e-02 pass
35000 35000 1 165.246 172.9897 1.115330e-09 3.237635e-02 pass
40000 40000 1 210.337 202.8639 1.359319e-09 3.023172e-02 pass
45000 45000 1 304.250 199.6849 1.876477e-09 3.301464e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 31.5624 35.4197
2000 2000 4 102.1229 102.3420
5000 5008 4 134.7537 135.4189
10000 10000 4 171.8853 172.4651
15000 15000 4 157.5834 160.0744
18000 18008 4 164.0152 164.5806
20000 20016 4 162.9734 163.1117
22000 22008 4 161.7738 161.9055
25000 25000 4 166.7235 167.4592
26000 26000 4 163.0475 163.7752
27000 27000 4 159.5668 159.5668
30000 30000 1 167.6913 167.6913
35000 35000 1 172.9897 172.9897
40000 40000 1 202.8639 202.8639
45000 45000 1 199.6849 199.6849
Residual checks PASSED
End of tests
Done: 2018年 1月 5日 金曜日 10:52:51 JST
確かに性能は低下しているように見えるが2%程度なので深刻な性能低下とは言えないように感じる。この結果はRedHatのナレッジベースの記事と符合する。
kernelのアップデート時にlibvirtやqemu-kvmもアップデートされていたので、仮想化環境で影響を強く受けるのではないかと思われる。
仮想化ゲスト上でベンチマークしてみる
せっかくなので仮想化ゲストを作成して比較してみる。
仮想化ホストはKVMでvCPU 32、メモリ 128GB、10GbE接続のiSCSI ストレージ(HDD)上にCentOS 7.4をインストールしている。minimalでインストール後にパッケージのアップデートは行っていない。
なお、下記ベンチマーク時に仮想化ホスト上で他の仮想化ゲストは動作していない。
仮想化ホストのkernel等アップデート前
都合によりパッチ適用前のホストは少し古いパッケージとなっている。
# yum list installed kernel libvirt qemu-kvm
インストール済みパッケージ
kernel.x86_64 2.6.32-642.13.1.el6 @updates
libvirt.x86_64 0.10.2-60.el6 @base
qemu-kvm.x86_64 2:0.12.1.2-2.491.el6_8.3 @updates
LINPACKベンチマーク
前掲のLINPACKベンチマークを実行するためにvCPUはホストと同じSandyBridge
を設定している。
なぜか14:05開始で1時間30分以上掛かっていることになっているが、実際には15:00あたりで開始しているはず…
2018年 1月 5日 金曜日 14:05:46 JST
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Fri Jan 5 14:05:46 2018
CPU frequency: 1.183 GHz
Number of CPUs: 32
Number of cores: 32
Number of threads: 32
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 1.079 0.6195 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.304 2.2011 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.009 71.4727 8.724688e-13 2.975343e-02 pass
1000 1000 4 0.009 70.7288 8.724688e-13 2.975343e-02 pass
2000 2000 4 0.045 119.0130 4.700934e-12 4.089237e-02 pass
2000 2000 4 0.047 113.0734 4.700934e-12 4.089237e-02 pass
5000 5008 4 1.012 82.3808 2.905478e-11 4.051455e-02 pass
5000 5008 4 0.904 92.2604 2.905478e-11 4.051455e-02 pass
10000 10000 4 6.006 111.0320 8.916344e-11 3.143993e-02 pass
10000 10000 4 9.940 67.0859 8.916344e-11 3.143993e-02 pass
15000 15000 4 20.360 110.5331 2.245460e-10 3.536637e-02 pass
15000 15000 4 12.060 186.6063 2.245460e-10 3.536637e-02 pass
18000 18008 4 24.829 156.6172 3.547512e-10 3.884962e-02 pass
18000 18008 4 23.544 165.1681 3.547512e-10 3.884962e-02 pass
20000 20016 4 24.910 214.1382 3.717613e-10 3.290902e-02 pass
20000 20016 4 33.420 159.6077 3.717613e-10 3.290902e-02 pass
22000 22008 4 36.094 196.6980 4.602465e-10 3.371124e-02 pass
22000 22008 4 34.890 203.4869 4.602465e-10 3.371124e-02 pass
25000 25000 4 57.939 179.8094 6.014913e-10 3.420466e-02 pass
25000 25000 4 62.237 167.3898 6.014913e-10 3.420466e-02 pass
26000 26000 4 68.882 170.1270 6.095033e-10 3.204951e-02 pass
26000 26000 4 53.737 218.0738 6.095033e-10 3.204951e-02 pass
27000 27000 4 83.299 157.5464 6.309912e-10 3.077034e-02 pass
30000 30000 1 91.139 197.5192 8.002768e-10 3.154699e-02 pass
35000 35000 1 143.780 198.8155 1.077145e-09 3.126789e-02 pass
40000 40000 1 193.893 220.0695 1.361293e-09 3.027561e-02 pass
45000 45000 1 294.865 206.0405 1.779995e-09 3.131714e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 36.2555 71.4727
2000 2000 4 116.0432 119.0130
5000 5008 4 87.3206 92.2604
10000 10000 4 89.0589 111.0320
15000 15000 4 148.5697 186.6063
18000 18008 4 160.8926 165.1681
20000 20016 4 186.8729 214.1382
22000 22008 4 200.0925 203.4869
25000 25000 4 173.5996 179.8094
26000 26000 4 194.1004 218.0738
27000 27000 4 157.5464 157.5464
30000 30000 1 197.5192 197.5192
35000 35000 1 198.8155 198.8155
40000 40000 1 220.0695 220.0695
45000 45000 1 206.0405 206.0405
Residual checks PASSED
End of tests
Done: 2018年 1月 5日 金曜日 15:37:40 JST
UnixBench
ついでなのでUnixBench 5.1.3でも比較。
BYTE UNIX Benchmarks (Version 5.1.3)
System: localhost.localdomain: GNU/Linux
OS: GNU/Linux -- 3.10.0-693.el7.x86_64 -- #1 SMP Tue Aug 22 21:09:27 UTC 2017
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 1: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 2: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 3: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 4: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 5: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 6: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 7: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 8: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 9: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 10: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 11: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 12: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 13: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 14: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 15: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 16: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 17: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 18: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 19: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 20: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 21: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 22: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 23: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 24: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 25: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 26: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 27: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 28: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 29: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 30: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 31: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
15:57:04 up 1:55, 3 users, load average: 0.23, 0.76, 9.16; runlevel 3
------------------------------------------------------------------------
Benchmark Run: 金 1月 05 2018 15:57:04 - 16:24:07
32 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 31960454.1 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3880.5 MWIPS (4.6 s, 7 samples)
Execl Throughput 1122.7 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 1012632.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 283216.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2497954.9 KBps (30.0 s, 2 samples)
Pipe Throughput 1448859.8 lps (10.0 s, 7 samples)
Pipe-based Context Switching 132236.3 lps (10.0 s, 7 samples)
Process Creation 3279.8 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2552.0 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1307.1 lpm (60.0 s, 2 samples)
System Call Overhead 2214891.1 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 31960454.1 2738.7
Double-Precision Whetstone 55.0 3880.5 705.5
Execl Throughput 43.0 1122.7 261.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 1012632.2 2557.2
File Copy 256 bufsize 500 maxblocks 1655.0 283216.9 1711.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 2497954.9 4306.8
Pipe Throughput 12440.0 1448859.8 1164.7
Pipe-based Context Switching 4000.0 132236.3 330.6
Process Creation 126.0 3279.8 260.3
Shell Scripts (1 concurrent) 42.4 2552.0 601.9
Shell Scripts (8 concurrent) 6.0 1307.1 2178.5
System Call Overhead 15000.0 2214891.1 1476.6
========
System Benchmarks Index Score 1052.4
------------------------------------------------------------------------
Benchmark Run: 金 1月 05 2018 16:24:07 - 16:24:07
32 CPUs in system; running 32 parallel copies of tests
仮想化ホストのkernel等アップデート後
# yum list installed kernel libvirt qemu-kvm
インストール済みパッケージ
kernel.x86_64 2.6.32-696.18.7.el6 @updates
libvirt.x86_64 0.10.2-62.el6_9.1 @updates
qemu-kvm.x86_64 2:0.12.1.2-2.503.el6_9.4 @updates
LINPACKベンチマーク
2018年 1月 5日 金曜日 17:10:02 JST
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Fri Jan 5 17:10:02 2018
CPU frequency: 1.185 GHz
Number of CPUs: 32
Number of cores: 32
Number of threads: 32
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.364 1.8369 8.724688e-13 2.975343e-02 pass
1000 1000 4 1.000 0.6690 8.724688e-13 2.975343e-02 pass
1000 1000 4 1.437 0.4653 8.724688e-13 2.975343e-02 pass
1000 1000 4 1.319 0.5070 8.724688e-13 2.975343e-02 pass
2000 2000 4 0.042 126.0391 4.700934e-12 4.089237e-02 pass
2000 2000 4 0.047 113.0396 4.700934e-12 4.089237e-02 pass
5000 5008 4 0.762 109.4769 2.905478e-11 4.051455e-02 pass
5000 5008 4 0.742 112.3330 2.905478e-11 4.051455e-02 pass
10000 10000 4 4.011 166.2445 8.916344e-11 3.143993e-02 pass
10000 10000 4 3.957 168.5287 8.916344e-11 3.143993e-02 pass
15000 15000 4 10.794 208.4864 2.245460e-10 3.536637e-02 pass
15000 15000 4 10.799 208.3938 2.245460e-10 3.536637e-02 pass
18000 18008 4 18.125 214.5514 3.547512e-10 3.884962e-02 pass
18000 18008 4 30.476 127.5958 3.547512e-10 3.884962e-02 pass
20000 20016 4 25.014 213.2472 3.717613e-10 3.290902e-02 pass
20000 20016 4 25.086 212.6328 3.717613e-10 3.290902e-02 pass
22000 22008 4 32.748 216.7970 4.602465e-10 3.371124e-02 pass
22000 22008 4 32.912 215.7182 4.602465e-10 3.371124e-02 pass
25000 25000 4 62.711 166.1261 6.014913e-10 3.420466e-02 pass
25000 25000 4 46.967 221.8114 6.014913e-10 3.420466e-02 pass
26000 26000 4 51.578 227.2016 6.095033e-10 3.204951e-02 pass
26000 26000 4 51.710 226.6210 6.095033e-10 3.204951e-02 pass
27000 27000 4 58.047 226.0823 6.309912e-10 3.077034e-02 pass
30000 30000 1 78.164 230.3074 8.002768e-10 3.154699e-02 pass
35000 35000 1 125.388 227.9778 1.077145e-09 3.126789e-02 pass
40000 40000 1 215.707 197.8145 1.361293e-09 3.027561e-02 pass
45000 45000 1 282.994 214.6835 1.779995e-09 3.131714e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 0.8696 1.8369
2000 2000 4 119.5393 126.0391
5000 5008 4 110.9050 112.3330
10000 10000 4 167.3866 168.5287
15000 15000 4 208.4401 208.4864
18000 18008 4 171.0736 214.5514
20000 20016 4 212.9400 213.2472
22000 22008 4 216.2576 216.7970
25000 25000 4 193.9687 221.8114
26000 26000 4 226.9113 227.2016
27000 27000 4 226.0823 226.0823
30000 30000 1 230.3074 230.3074
35000 35000 1 227.9778 227.9778
40000 40000 1 197.8145 197.8145
45000 45000 1 214.6835 214.6835
Residual checks PASSED
End of tests
Done: 2018年 1月 5日 金曜日 17:49:58 JST
UnixBench
BYTE UNIX Benchmarks (Version 5.1.3)
System: localhost.localdomain: GNU/Linux
OS: GNU/Linux -- 3.10.0-693.el7.x86_64 -- #1 SMP Tue Aug 22 21:09:27 UTC 2017
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 1: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 2: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 3: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 4: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 5: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 6: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 7: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 8: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 9: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 10: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 11: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 12: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 13: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 14: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 15: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 16: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 17: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 18: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 19: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 20: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 21: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 22: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 23: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 24: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 25: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 26: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 27: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 28: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 29: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 30: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
CPU 31: Intel Xeon E312xx (Sandy Bridge) (4400.0 bogomips)
x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
18:08:06 up 1:24, 1 user, load average: 0.00, 0.83, 9.15; runlevel 3
------------------------------------------------------------------------
Benchmark Run: 金 1月 05 2018 18:08:06 - 18:35:07
32 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 32439270.4 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3944.9 MWIPS (4.9 s, 7 samples)
Execl Throughput 1133.5 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 1029433.5 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 287239.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2759356.5 KBps (30.0 s, 2 samples)
Pipe Throughput 1501040.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 129568.2 lps (10.0 s, 7 samples)
Process Creation 3351.8 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2616.8 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1357.5 lpm (60.0 s, 2 samples)
System Call Overhead 2257813.0 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 32439270.4 2779.7
Double-Precision Whetstone 55.0 3944.9 717.3
Execl Throughput 43.0 1133.5 263.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 1029433.5 2599.6
File Copy 256 bufsize 500 maxblocks 1655.0 287239.0 1735.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 2759356.5 4757.5
Pipe Throughput 12440.0 1501040.6 1206.6
Pipe-based Context Switching 4000.0 129568.2 323.9
Process Creation 126.0 3351.8 266.0
Shell Scripts (1 concurrent) 42.4 2616.8 617.2
Shell Scripts (8 concurrent) 6.0 1357.5 2262.5
System Call Overhead 15000.0 2257813.0 1505.2
========
System Benchmarks Index Score 1078.1
------------------------------------------------------------------------
Benchmark Run: 金 1月 05 2018 18:35:07 - 18:35:07
32 CPUs in system; running 32 parallel copies of tests
結論
とりあえず仮想化ゲストでもたちまち遅くなるという訳ではなさそう。多くの仮想化ゲストに同時に負荷をかけてやれば違いが出るのかもしれないが、手元環境の都合上試すのは難しいのでパス。あるいはWindowsデスクトップPCで報告されているように高速ストレージだと有意な差が見られるのかもしれない。
それより仮想化ホスト上でLINPACKベンチマークを行った時の結果の不安定さが気になった。