More than 5 years have passed since last update.

Linux Kernel 3.0以降について調べてみた

Last updated at 2018-06-22Posted at 2016-12-30

長いので太字で要点
絞るためにGPU、仮想関係は省いてます。

NWが長くなったので分割
http://qiita.com/bringer1092/items/b6cd96a7f7db7121e8a7

I/Oが長くなったので分割
http://qiita.com/bringer1092/items/4a62ec6ab62b896ab611

ハードウェアサポート

4.6 USB 3.1 SuperSpeedPlus (10 Gbps) support

USB3.1のサポート
SuperSpeedPlus (10 Gbps)サポート

4.12 USB Type-C support

USB Type-Cのサポート
Type-Cについて調べるとUSB PD(最大100Wの電力)規格、映像出力用に対応

セキュリティ周り

何を載せるか粒度が人によって大きく異るが

3.11 New O_TMPFILE open(2) flag to reduce temporary file vulnerabilities

安全な一時ファイルの実装
O_TMPFILEフラグでオープンされたファイルは作成されますが、ファイルシステムには表示されません。そして、クローズされるとすぐにそれらは削除される
なお、MySQLでは作成した瞬間にオープン、削除と行い削除されているので他からは見られないけど先にオープンしたMySQLだけが参照できるというのを実装している

3.19 Support for the Intel Memory Protection Extensions

Intelのメモリ保護MPXをサポート

4.0 Live patching

ライブカーネルパッチ
再起動なしでカーネルにパッチ適応
有料では最近で一番大きい改良
はい、自分で仕組みを組み込まない限りUbuntuですら無料では3台しかできません。RHELは有料

4.1 Ext4 encryption support

Ext4ファイルシステムで暗号化サポート
オプションではなく直接実装し既存のdm-cryptやecryptfsよりも高速に
速度とファイル、ディレクトリ単位で出来るのが特徴
やり方はここにある
http://askubuntu.com/questions/643577/how-to-create-ext4-encrypted-partition-on-ubuntu-15-04-with-new-4-1-kernel

4.2 Stacking of security modules

Linuxカーネルでセキュリティモジュールは1つしか指定できなかったが複数(スタック)のサポート

4.9 Protection keys syscall support

Protection keysを使用するより完全なAPIを提供

4.12 Progress in Live kernel patching

ライブアップデートでタスクごとの一貫性を保証できるようにした

4.14 Add support for AMD Secure Memory Encryption

AMDメモリ暗号化SMEをサポート

4.15 Meltdown/Spectre

主にメモリ性能低下を引き起こすCPUの脆弱性対応
Javascriptsなどローカルからのみ実行が可能、ブラウザを使うPCでは導入必須
カーネルブートオプションでpti=off, spectre_v2=offで無効にできる
現在使用されているCPUと緩和策に影響する脆弱性を示す/sys/devices/system/cpu/ vulnerabilities/ディレクトリが追加された。

メモリ周り

3.0 Cleancache

memcacheみたいなもの。説明は下記参照
http://gihyo.jp/dev/serial/01/linuxcon_basic/0006?page=2

3.1 page allocator: fix significant stalls while copying large amounts of data on NUMA machines

NUMAマシン上で大量データコピーしている間の重要なストールを防ぐ

3.5 Frontswap support. Frontswap is so named because it can be thought of as the opposite of a "backing" store for a swap device.

Frontswapのサポート。
vmswapness=0の挙動が変わったためこれ以降のバージョンではvmswapness=1が推奨されるようになった
CentOS6.4にも入ったため有名に

3.6 Allow swap readahead IOPS to be merged, it improves throughput and at the same time lowers CPU consumption

スワップ先読みIOPSを許可し、スループットを向上させ、同時にCPU消費を削減

3.11 Kswapd and page reclaim behaviour has been screwy in one way or the other for a long time

kswapdがCPU100%になる問題の回避

3.12 Better Out-Of-Memory handling

OOMキラーの挙動変更

3.12 swap: change block allocation algorithm for SSD

スワップにおいてのオーバーヘッドをSSDの場合のみ改良

3.13 Improved performance in NUMA systems

NUMAの改良
今のx86のアーキテクチャだとCPU毎にメモリがぶら下がるが
複数物理CPUがある場合異なるCPUのメモリにアクセスする場合のオーバーヘッドが無視できない
そのためメモリのデータが自分のCPUのメモリにあるように配置できるようにパラメーターを追加

パラメーターの詳しい説明は
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/Documentation/sysctl/kernel.txt?id=10fc05d0e551146ad6feb0ab8902d28a2d3c5624

kernel.numa_balancing_scan_delay_ms = 1000
kernel.numa_balancing_scan_period_max_ms = 60000
kernel.numa_balancing_scan_period_min_ms = 1000
kernel.numa_balancing_scan_size_mb = 256
kernel.numa_balancing_settle_count = 4

こんな感じで設定
CPUをたくさん扱えるミドルウェアでは効果がありそう、
だがメモリが他コアのメモリまで必要とするほど使うものでは意味がないかな
kernel.numa_balancing_settle_countは3.14で削除された模様

3.14 Add overcommit_kbytes sysctl variable

メモリがTB単位になり%単位より細かいovercommit_kbytesの実装

3.15 Improved working set size detection

非アクティブメモリの挙動変更
メモリが非アクティブになってからの時間をトレースし
効率化を図れるようにしたので効果はあるはずとのこと

3.15 NUMA: make smarter decisions on NUMA migrations in order to maximize performance of workloads that do not fit in one NUMA node

1つのNUMAに収まらないワークロードのパフォーマンスを最大化
NUMAを無効にしたほうが早いかな

4.0 cgroups: Per memory cgroup slab shrinkers

cgroupのメモリ管理を新設計新インターフェイスに
新しいI/Fにしないと効果がない

4.3 Compaction cleanups and performance optimizations

圧縮クリーンナップと最適化

4.5 proc: There are several shortcomings with the accounting of shared memoryいくつかの欠点があった共用メモリの計算の改善

4.5 pipes: limit the per-user amount of pages allocated in pipes.

管理者権限のないユーザーがパイプに割り当て可能なページの最大数を設定
pipe-user-pages-soft
pipe-user-pages-hard
というsysctパラメータ追加

4.6 Improve the reliability of the Out Of Memory task killer

OOMキラーの信頼性向上
アルゴリムの改善。OOMキラー対象のメモリを先に開放し再利用。

4.7 Support bigger cache working sets and protect against writes

大きなキャッシュワーキングセットをサポートし、書き込みから保護
特定のPostgreSQLワークロードに問題があるため、メモリ管理コードに変更

4.7 Rework OOM detection to make it more reliable

より信頼性の高いOOM検出への再調整

4.7 slab allocator (not to confuse with the default slub allocator): reduce lock contention in alloc path to improve concurrent allocation.

サイズが128バイトを超えるオブジェクトクラスの場合、パフォーマンスが50%以上向上

4.8 Support for using Transparent Huge Pages in the page cache

ページキャッシュに透明な巨大ページを使用するためのサポート
Transparent huge pagesによって2MBのページが扱えるが
ページキャッシュがシステム上のメモリーの最大のユーザであることが
多いのにサポートされていなかった。
tmpfs/shmemのページキャッシュ内でのサポートが追加
mountオプションhuge=でtmpfsのhugepage割り当てポリシーを制御できる

4.11 Scalable swapping for SSDs

スワップ処理をSSDに最適化されるよう拡張

4.13 Improved block layer and background writes error handling

バッググラウンド書き込みプロセスが失敗しても初回しかエラーが出さなかったが
エラーを新しい処理にしfsyncで信頼性の高いレポートエラーが出るようになった

4.14 Bigger memory limits

x64の最大メモリをを64TBから128PiBに

4.15 Speed up page cache truncation

ページキャッシュのtruncationを高速化

4.15 Make able to disable NUMA stats for improved performance

/proc/sys/vm/numa_statに0を設定することで、パフォーマンス向上のためにNUMA統計を無効にできる

CPU

3.2 Process bandwidth(各プロセスにおけるCPUリソースの利用量を制限)

cgroupの機能でもうあると思っていた
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/scheduler/sched-bwc.txt?id=HEAD

3.4 x86 CPUの自動検出

CPUにより色々な機能が追加されているがCRC32強化などの機能が使われていない。
正しいCPU型番を識別子して機能を全部使うための判定

3.7 Intel "supervisor mode access prevention" support

Intel SMAPサポート
Recommended Intel documentation: Intel® Architecture Instruction Set Extensions Programming Reference

3.15 Intel AVX-512 vector instructions support

AVX-512のサポート

3.16 Control groups allow to create groups of arbitrary processes and apply CPU

コントロールグループの1階層化

3.18 Add support for gcc 5

gcc5サポート

3.19 HSA driver for AMD GPU devices

AMD HSAアーキテクチャのサポート

4.2 Queued spinlocks become the default spinlock implementation

キューベースのスピンロックx86アーキテクチャ
2CPU以上のNUMAマシンにおいてスケジューラーの強化

4.6 dma-buf: new ioctl to manage cache coherency between CPU and GPU

CPUとGPU間のキャッシュコヒーレンシを管理する新しいioctlサポート

4.10 Support for Intel Cache Allocation Technology

L2/3 CPUキャッシュのポリシーを設定できるインテルの機能。例えばリアルタイムタスクに専用のキャッシュスペースを割り当てることができます

4.16 epoll: Massively increase nested epoll performance

ネストされたepollのパフォーマンスを大幅に向上

4.15 New architecture: RISC-V

RISC-VというCPUをサポート。今後採用が増えるかも

圧縮

4.11 update LZ4 compressor module

LZ4をアップデートしLZ4 fast xxという設定が可能に。
fastなので早くなる代わりに圧縮率が低下します。