x86(i386)アーキテクチャ向けにGCCが出力するコードで、スタックフレームが16byteアライメントされる理由を求めて。1
※ 本記事の正確性は保証できません。誤り指摘や追加情報があればコメントまでお願いします。
経緯
- SysV ABI当初ではワード(4byte)アライメントのみが要請されていた。
- MMX, SSEなどのSIMD命令登場により、スタック変数でも16byteアライメントが要請されるケースが発生。またアライメント必須でない命令であってもパフォーマンス的には16byteアライメントが好ましい。
- Mac OS X i386 ABIでは16byteアライメントを明確に要請。
- GCCでサポートするLinux/BSD/OSX間の一貫性の観点から16byteアライメントを採用。
- 新しいSysV psABIでは16byteアライメントに変更。
GCC Bugzilla
- #27537 XMM alignment fault when compiling for i386 with -Os
- #38496 Gcc misaligns arrays when stack is forced follow the x8632 ABI
- #40838 gcc shouldn't assume that the stack is aligned
ABI仕様
x86-64
System V Application Binary Interface, AMD64 Architecture Processor Supplement, Draft Version 0.99.6:
3.2.2 The Stack Frame
In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Figure 3.3 shows the stack organization. The end of the input argument area shall be aligned on a 16 (32, if__m256
is passed on stack) byte boundary. In other words, the value(%rsp + 8)
is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer,%rsp
, always points to the end of the latest allocated stack frame.
i386
SYSTEM V APPLICATION BINARY INTERFACE, Intel386 Architecture Processor Supplement:
Registers and the Stack Frame
Some registers have assigned roles in the standard calling sequence:
%esp
: The stack pointer holds the limit of the current stack frame, which is the address of the stack’s bottom-most, valid word. At all times, the stack pointer should point to a word-aligned area.
OS X ABI Function Call Guide, IA-32 Function Calling Conventions:
The function calling conventions used in the IA-32 environment are the same as those used in the System V IA-32 ABI, with the following exceptions:
- The stack is 16-byte aligned at the point of function calls
System V Application Binary Interface, Intel386 Architecture Processor Supplement, Version 1.1:
2.2.2 The Stack Frame
In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Table 2.2 shows the stack organization.
The end of the input argument area shall be aligned on a 16 (32 or 64, if__m256
or__m512
is passed on stack) byte boundary. In other words, the value(%esp + 4)
is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer,%esp
, always points to the end of the latest allocated stack frame.
-
teratail ELFの関数プロローグについて への回答がきっかけ ↩