More than 5 years have passed since last update.

What Every Programmer Should Know About Memory

Last updated at 2014-08-29Posted at 2014-08-24

"What Every Programmer Should Know About Memory", Ulrich Drepper, 2007
http://www.akkadia.org/drepper/cpumemory.pdf

Abstract
As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques –such as CPU caches– but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

（参考訳）

概要
CPUコアが高速化かつメニーコア化するにつれて、メモリアクセスが、現時点および当面の間も、大抵のプログラムに対する制限要因となるでしょう。ハードウェア設計者たちは、例えばCPUキャッシュといった、より精巧なメモリハンドリングや高速化手法を考案してきましたが、これらを有効活用するためにはプログラマによる助けが必要です。残念ながら、大抵のプログラマは、コンピュータ・メモリシステムやCPUキャッシュに関する構造および利用コストについて、十分に理解していません。本稿では、なぜCPUキャッシュが開発され、またどのように動作するかといった、最近の一般的なハードウェアで用いられるメモリ・サブシステムの構造について解説し、さらにそれらを利用して最適なパフォーマンスを引き出すために、プログラマがすべき対応を説明します。

目次（Level2まで）
1 Introduction
2 Commodity Hardware Today
2.1 RAM Types
2.2 DRAM Access Technical Details
2.3 Other Main Memory Users
3 CPU Caches
3.1 CPU Caches in the Big Picture
3.2 Cache Operation at High Level
3.3 CPU Cache Implementation Details
3.4 Instruction Cache
3.5 Cache Miss Factors
4 Virtual Memory
4.1 Simplest Address Translation
4.2 Multi-Level Page Tables
4.3 Optimizing Page Table Access
4.4 Impact Of Virtualization
5 NUMA Support
5.1 NUMA Hardware
5.2 OS Support for NUMA
5.3 Published Information
5.4 Remote Access Costs
6 What Programmers Can Do
6.1 Bypassing the Cache
6.2 Cache Access
6.3 Prefetching
6.4 Multi-Thread Optimizations
6.5 NUMA Programming
7 Memory Performance Tools
7.1 Memory Operation Profiling
7.2 Simulating CPU Caches
7.3 Measuring Memory Usage
7.4 Improving Branch Prediction
7.5 Page Fault Optimization
8 Upcoming Technology
8.1 The Problem with Atomic Operations
8.2 Transactional Memory
8.3 Increasing Latency
8.4 Vector Operations
A Examples and Benchmark Programs
A.1 Matrix Multiplication
A.2 Debug Branch Prediction
A.3 Measure Cache Line Sharing Overhead
B Some OProfile Tips
B.1 Oprofile Basics
B.2 How It Looks Like
B.3 Starting To Profile
C Memory Types
D libNUMA Introduction
E Index
F Bibliography
G Revision History

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up