Edited at

Meltdown cpu脆弱性カーネルアップデートでほんとに性能ダウンした

More than 1 year has passed since last update.


環境

DELL PowerEdge R300

cpu: Xeon X3363 2.83GHz 4core  開発コードYorkfield(-CL)

https://ark.intel.com/ja/products/35279/Intel-Xeon-Processor-X3363-12M-Cache-2_83-GHz-1333-MHz-FSB

mem: 12GB

hdd: SAS6G 15000rpm

os: ubuntu 16.04

4.4.0-104-generic



4.13.0-25-generic


やった手順

uname -aでカーネルバージョン確認

Linux un11 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



spectre-meltdown-checkerを実行してみる

wget https://raw.githubusercontent.com/speed47/spectre-meltdown-checker/master/spectre-meltdown-checker.sh

sudo sh ./spectre-meltdown-checker.sh



結果

上2つがSpectre。下がMeltdown。「全〜然だめぇ〜」と言われる


Spectre and Meltdown mitigation detection tool v0.21

Checking for vulnerabilities against live running kernel Linux 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64

CVE-2017-5753 [bounds check bypass] aka 'Spectre Variant 1'

* Checking count of LFENCE opcodes in kernel: NO (only 38 opcodes found, should be >= 70)

> STATUS: VULNERABLE (heuristic to be improved when official patches become available)

CVE-2017-5715 [branch target injection] aka 'Spectre Variant 2'

* Mitigation 1

* Hardware (CPU microcode) support for mitigation: NO

* Kernel support for IBRS: NO

* IBRS enabled for Kernel space: NO

* IBRS enabled for User space: NO

* Mitigation 2

* Kernel compiled with retpoline option: NO

* Kernel compiled with a retpoline-aware compiler: NO

> STATUS: VULNERABLE (IBRS hardware + kernel support OR kernel with retpoline are needed to mitigate the vulnerability)

CVE-2017-5754 [rogue data cache load] aka 'Meltdown' aka 'Variant 3'

* Kernel supports Page Table Isolation (PTI): NO

* PTI enabled and active: NO

> STATUS: VULNERABLE (PTI is needed to mitigate the vulnerability)

A false sense of security is worse than no security at all, see --disclaimer`




ベンチマーク実施

wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/byte-unixbench/UnixBench5.1.3.tgz

tar xvzf UnixBench5.1.3.tgz

結果はページ末尾に記載↓↓



アップグレードする

sudo apt install linux-generic-hwe-16.04-edge



再起動



カーネルバージョン確認

Linux un11 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux



spectre-meltdown-checkerを実行


Spectre and Meltdown mitigation detection tool v0.24

Checking for vulnerabilities against live running kernel Linux 4.13.0-25-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018 x86_64

CVE-2017-5753 [bounds check bypass] aka 'Spectre Variant 1'

* Checking count of LFENCE opcodes in kernel: NO (only 42 opcodes found, should be >= 70)

> STATUS: VULNERABLE (heuristic to be improved when official patches become available)

CVE-2017-5715 [branch target injection] aka 'Spectre Variant 2'

* Mitigation 1

* Hardware (CPU microcode) support for mitigation: NO

* Kernel support for IBRS: NO

* IBRS enabled for Kernel space: NO

* IBRS enabled for User space: NO

* Mitigation 2

* Kernel compiled with retpoline option: NO

* Kernel compiled with a retpoline-aware compiler: NO

> STATUS: VULNERABLE (IBRS hardware + kernel support OR kernel with retpoline are needed to mitigate the vulnerability)

CVE-2017-5754 [rogue data cache load] aka 'Meltdown' aka 'Variant 3'

* Kernel supports Page Table Isolation (PTI): YES

* PTI enabled and active: YES

> STATUS: NOT VULNERABLE (PTI mitigates the vulnerability)

A false sense of security is worse than no security at all, see --disclaimer


Meltdownだけ対策されてるー



ベンチマーク実施


ベンチマーク結果


対策前

   BYTE UNIX Benchmarks (Version 5.1.3)

System: miner1: GNU/Linux
OS: GNU/Linux -- 4.4.0-104-generic -- #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 1: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 2: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 3: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.2 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
19:05:19 up 21 days, 22:41, 1 user, load average: 0.00, 0.00, 0.00; runlevel 5

------------------------------------------------------------------------
Benchmark Run: 水 1月 10 2018 19:05:19 - 19:33:34
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 28119902.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3158.7 MWIPS (9.9 s, 7 samples)
Execl Throughput 2974.6 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 783667.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 221717.8 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1180483.7 KBps (30.0 s, 2 samples)
Pipe Throughput 1573155.7 lps (10.0 s, 7 samples)
Pipe-based Context Switching 198800.3 lps (10.0 s, 7 samples)
Process Creation 7286.8 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 9579.3 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2890.5 lpm (60.0 s, 2 samples)
System Call Overhead 2293641.9 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 28119902.0 2409.6
Double-Precision Whetstone 55.0 3158.7 574.3
Execl Throughput 43.0 2974.6 691.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 783667.8 1979.0
File Copy 256 bufsize 500 maxblocks 1655.0 221717.8 1339.7
File Copy 4096 bufsize 8000 maxblocks 5800.0 1180483.7 2035.3
Pipe Throughput 12440.0 1573155.7 1264.6
Pipe-based Context Switching 4000.0 198800.3 497.0
Process Creation 126.0 7286.8 578.3
Shell Scripts (1 concurrent) 42.4 9579.3 2259.3
Shell Scripts (8 concurrent) 6.0 2890.5 4817.6
System Call Overhead 15000.0 2293641.9 1529.1
========
System Benchmarks Index Score 1332.2

------------------------------------------------------------------------
Benchmark Run: 水 1月 10 2018 19:33:34 - 20:01:50
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables 110748798.1 lps (10.0 s, 7 samples)
Double-Precision Whetstone 12669.0 MWIPS (9.9 s, 7 samples)
Execl Throughput 10551.4 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 627694.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 177384.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1454342.1 KBps (30.0 s, 2 samples)
Pipe Throughput 6318157.0 lps (10.0 s, 7 samples)
Pipe-based Context Switching 1062264.5 lps (10.0 s, 7 samples)
Process Creation 19246.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 26011.5 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 3215.3 lpm (60.0 s, 2 samples)
System Call Overhead 4000482.5 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 110748798.1 9490.0
Double-Precision Whetstone 55.0 12669.0 2303.5
Execl Throughput 43.0 10551.4 2453.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 627694.8 1585.1
File Copy 256 bufsize 500 maxblocks 1655.0 177384.4 1071.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 1454342.1 2507.5
Pipe Throughput 12440.0 6318157.0 5078.9
Pipe-based Context Switching 4000.0 1062264.5 2655.7
Process Creation 126.0 19246.2 1527.5
Shell Scripts (1 concurrent) 42.4 26011.5 6134.8
Shell Scripts (8 concurrent) 6.0 3215.3 5358.9
System Call Overhead 15000.0 4000482.5 2667.0
========
System Benchmarks Index Score 2937.5


対策後

   BYTE UNIX Benchmarks (Version 5.1.3)

System: miner1: GNU/Linux
OS: GNU/Linux -- 4.13.0-25-generic -- #29~16.04.2-Ubuntu SMP Tue Jan 9 12:16:39 UTC 2018
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.5 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 1: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.5 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 2: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.5 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 3: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz (5666.5 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
10:27:19 up 12:29, 1 user, load average: 0.00, 0.00, 0.00; runlevel 5

------------------------------------------------------------------------
Benchmark Run: 木 1月 11 2018 10:27:19 - 10:55:32
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 26211740.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3166.2 MWIPS (9.9 s, 7 samples)
Execl Throughput 3624.6 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 417875.3 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 115622.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 880767.8 KBps (30.0 s, 2 samples)
Pipe Throughput 670758.3 lps (10.0 s, 7 samples)
Pipe-based Context Switching 190568.3 lps (10.0 s, 7 samples)
Process Creation 7535.9 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 8931.3 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2451.1 lpm (60.0 s, 2 samples)
System Call Overhead 616139.0 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 26211740.0 2246.1
Double-Precision Whetstone 55.0 3166.2 575.7
Execl Throughput 43.0 3624.6 842.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 417875.3 1055.2
File Copy 256 bufsize 500 maxblocks 1655.0 115622.4 698.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 880767.8 1518.6
Pipe Throughput 12440.0 670758.3 539.2
Pipe-based Context Switching 4000.0 190568.3 476.4
Process Creation 126.0 7535.9 598.1
Shell Scripts (1 concurrent) 42.4 8931.3 2106.4
Shell Scripts (8 concurrent) 6.0 2451.1 4085.2
System Call Overhead 15000.0 616139.0 410.8
========
System Benchmarks Index Score 966.3

------------------------------------------------------------------------
Benchmark Run: 木 1月 11 2018 10:55:32 - 11:23:47
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables 105167823.9 lps (10.0 s, 7 samples)
Double-Precision Whetstone 12665.7 MWIPS (9.9 s, 7 samples)
Execl Throughput 9885.1 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 601864.0 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 171021.2 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1324158.9 KBps (30.0 s, 2 samples)
Pipe Throughput 2700705.7 lps (10.0 s, 7 samples)
Pipe-based Context Switching 676091.6 lps (10.0 s, 7 samples)
Process Creation 18847.7 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 20126.8 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 2822.7 lpm (60.1 s, 2 samples)
System Call Overhead 2136850.9 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 105167823.9 9011.8
Double-Precision Whetstone 55.0 12665.7 2302.9
Execl Throughput 43.0 9885.1 2298.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 601864.0 1519.9
File Copy 256 bufsize 500 maxblocks 1655.0 171021.2 1033.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 1324158.9 2283.0
Pipe Throughput 12440.0 2700705.7 2171.0
Pipe-based Context Switching 4000.0 676091.6 1690.2
Process Creation 126.0 18847.7 1495.8
Shell Scripts (1 concurrent) 42.4 20126.8 4746.9
Shell Scripts (8 concurrent) 6.0 2822.7 4704.5
System Call Overhead 15000.0 2136850.9 1424.6
========
System Benchmarks Index Score 2360.1


比較

内容
説明
アップデート前
アップデート後
性能低下割合

Dhrystone 2 using register variables
整数演算(回数/秒)
110748798.1
105167823.9
95.0

Double-Precision Whetstone
浮動小数点数演算(回数/秒)
12669.0
12665.7
100.0

Execl Throughput
execlの実行(回数)
10551.4
9885.1
93.7

File Copy 1024 bufsize 2000 maxblocks
ファイルのコピー(KBps)
627694.8
601864.0
95.9

File Copy 256 bufsize 500 maxblocks
ファイルのコピー(KBps)
177384.4
171021.2
96.4

File Copy 4096 bufsize 8000 maxblocks
ファイルのコピー(KBps)
1454342.1
1324158.9
91.0

Pipe Throughput
パイプを繰り返す(回数)
6318157.0
2700705.7
42.7

Pipe-based Context Switching
2つのプロセス間でパイプを繰り返す(回数)
1062264.5
676091.6
63.6

Process Creation
プロセスを作成する(回数)
19246.2
18847.7
97.9

Shell Scripts (1 concurrent)
shellコマンドをたたく(回数)
26011.5
20126.8
77.4

Shell Scripts (8 concurrent)
上記x8を同時に実行(回数)
3215.3
2822.7
87.8

System Call Overhead
getpidを実行(回数)
4000482.5
2136850.9
53.4

同じ筐体で軒並み数値が下がっていますので、性能低下は間違いないようです。

1桁台のみの低下が多いようですが、パイプを使った処理には苦戦しているようです。

投機的処理がかなりスピードに貢献していたんですね。


その他雑記

Spectre(Bounds Check Bypass)のほうが先に対策されそう?

Spectre(Branch Target Injection)の攻撃を防ぐための以下3つの機能

IBRS

STIBP

IBPB

が必要らしい。

当然DELL PowerEdge R300 にはBIOSアップデート来てない。

トランポリンすると性能低下せずに回避できるらしい。

でもカーネルやディストリビューションのアップデートが来るまではまだ時間がかかるらしい。

頑張ってる人たちにドリンク送ろう。

intelから1/8に配られたマイクロコード(k)

https://downloadcenter.intel.com/download/27431/Linux-Processor-Microcode-Data-File