More than 5 years have passed since last update.

hello world Programの実行命令をバイトコードで確認する

Last updated at 2018-06-14Posted at 2018-06-14

https://qiita.com/knknkn1162/items/c67ae7c2ef71a713adf8 の補足記事だが、単独でも読めるようにしている

アセンブリ

hello.asm

section .data
  hello_world db "Hello world!", 10
  ; See also https://www.tutorialspoint.com/assembly_programming/assembly_strings.htm
  hello_world_len equ $ - hello_world ; $(points to the byte after the last character of the string variable msg. ) - hello_world

section .text
  global _start

_start:
  mov rax, 1 ; sys_write
  mov rdi, 1
  mov rsi, hello_world
  mov rdx, hello_world_len
  syscall

  mov rax, 60 ; sys_exit
  mov rdi, 0
  syscall

が、

nasm -f elf64 hello.o
ld -o hello hello.o

されて、バイトコードが

hello

# .text section の中身
$ objdump -d ./hello
00000000004000b0 <_start>:
  4000b0:	b8 01 00 00 00       	mov    $0x1,%eax
  4000b5:	bf 01 00 00 00       	mov    $0x1,%edi
  4000ba:	48 be d8 00 60 00 00 	movabs $0x6000d8,%rsi # mov rsi, hello_world
  4000c1:	00 00 00
  4000c4:	ba 0d 00 00 00       	mov    $0xd,%edx
  4000c9:	0f 05                	syscall
  4000cb:	b8 3c 00 00 00       	mov    $0x3c,%eax
  4000d0:	bf 00 00 00 00       	mov    $0x0,%edi
  4000d5:	0f 05                	syscall
$ readelf -x.text hello
Hex dump of section '.text':
  0x004000b0 b8010000 00bf0100 000048be d8006000 ..........H...`.
  0x004000c0 00000000 ba0d0000 000f05b8 3c000000 ............<...
  0x004000d0 bf000000 000f05                     .......

$ readelf -x.data hello.o

Hex dump of section '.data':
  0x00000000 48656c6c 6f20776f 726c6421 0a       Hello world!.

こんなふうになるんだけど、ニーモニック(movとかsyscallとか)がどうやって各バイトに対応しているのかを確認する。というのも、

movなのに、バイトコードがb8とかbfで違う
movabsってないやん

とかのとこでちょっと詰まってしまったからです。

環境

https://qiita.com/knknkn1162/items/c67ae7c2ef71a713adf8#%E7%92%B0%E5%A2%83 と同じ。

reference

本当は、Intel® 64 and IA-32 Architectures Software Developer’s Manual のvol.2 で確認すべきなんだけれど、すべてそれでやると大変なので、
簡易的には、 http://ref.x86asm.net/coder64.html 使わせてもらう

用語確認

instruction .. b8 01 00 00 00のような命令
opcode .. instructionには、prefixとかが含まれるのだが、instructionのなかのprimaryな部分(表を参照)

表は、Intel® 64 and IA-32 Architectures Software Developer’s Manual vol2A ch2.1から取ってきた図。

mnemonic .. opcodeのidentifier
operand .. mnemonicの引数。ニーモニックがmovの場合、第一引数がdestination operand, 第二引数が、source operandとなる。

1行目~2行目

一行目から。

  4000b0:	b8 01 00 00 00       	mov    $0x1,%eax

movの定義を見ると、http://ref.x86asm.net/coder64.html#xB8

みたいなふうになっている。¹3行目のrはregister codeを表し、opcode(1byte)の下位3ビット(つまり、0~7)の部分に相当する。
(B8=1011 1000で下位3ビットが000で有ることに注意しよう) 要するに、0xB8にregister codeを足せば良い。

の表から、EAX None 0となっている部分をとってくれば良い。つまり、B8+r=B8+0=B8となる。

アドレスは、32ビットで01 00 00 00(little endianなので、0x01)。

2行目(4000b5: bf 01 00 00 00 mov $0x1,%edi)もほぼ同じで、edi None 7なので、　B8+7=0xBFとなるところが違う。

3,4行目

  4000ba:	48 be d8 00 60 00 00 	movabs $0x6000d8,%rsi # mov rsi, hello_world
  4000c1:	00 00 00

2行に見えるが64bitなので、48 be d8 00 60 00 00 00 00 00。d8 00 60 00 00 00 00 00=$0x6000d8にあたる。

残るは、48 beの解釈。

48で調べると、

となる。operandが64bitの場合に用いるprefixのようだ。
もうちょっと詳しく見る。48=0b01001000だ。

で、REX prefix²をみる：

	-	W	R	X	B
	0100	1	0	0	0

64bitなので、Wのbit flagが1となる。上位4bitは定数0100なので、10進数で言うところの48と合致する。

movabsは単純にmovの64bitバージョンというだけ³。

2つめのbeはregister codeの表からregister codeが6なので、B8+6=BEで良いですね。

5行目

4000c4: ba 0d 00 00 00 mov $0xd,%edxも、 1行目~2行目と全くおんなじように解釈する

6行目

syscallのopcode(0f 05) http://ref.x86asm.net/coder64.html#x0F05 を素直に見れば良い。0Fはprefixで、Intel developpers manualの2.1.2にAn escape opcode byte 0FH as the primary opcode, plus two additional opcode bytes.とある。

7行目~

7行目からは、上記とおんなじようにすれば良いので、省略

まとめ

ということで、簡単でしたが、

movなのに、バイトコードがb8とかbfで違う -> B8+[register code]で求められる。register codeの値はregisterによって異なるので、表を参照のこと。
movabsってないやん　-> movの64bitバージョンってだけ
先頭の48とは？ -> 64bit operandをあわらすためのprefix

というわけでした。バイトコード読むのなれている人にとっては、うんって話だと思いますが、初見だと細かいところで引っかかってしまいますね

rex.bはREX prefixの下位１ビットに相当するbit flag。R8~R15のレジスタでrex.bが用いられる。 ↩
用語解説opcodeの注意書きに1. The REX prefix is optional, but if used must be immediately before the opcode; see Section 2.2.1, “REX Prefixes” for additional information.と書いてある。 ↩
https://sourceware.org/binutils/docs/as/i386_002dVariations.html にIn 64-bit code, ‘movabs’ can be used to encode the ‘mov’ instruction with the 64-bit displacement or immediate operand.と書いてある。https://reverseengineering.stackexchange.com/questions/2627/what-is-the-meaning-of-movabs-in-gas-x86-att-syntax も参考に。 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up