More than 5 years have passed since last update.

フューチャー株式会社

hash/maphashコードリーディング

Last updated at 2020-02-25Posted at 2020-02-25

1 / 22

お前誰よ

渋川よしき
- フューチャー株式会社で仕事してます
- いろいろ本を書いたりしています
  - みなさん買ってくださっていますよね？
- 今日は、家族がいろいろ体調を崩していて時間が取れなかったのでQiitaスライドで

今回の内容

フューチャー社内で行ったコードリーディングの勉強会の再演です。現在、標準ライブラリをみんなで読んで解説しています。
わたしは1.14で追加されたhash/maphashを説明しました。

1.14の変更

Go 1.14のリリースノート: https://tip.golang.org/doc/go1.14
hash/maphash https://tip.golang.org/pkg/hash/maphash/

ここによると、1.14ではぼちぼち新しいメソッドやRISC-V実験サポート開始や、lock周りやdeferの高速化、テストが後処理ができるようになるなどがあるが、完全な新機能という点では、hash/maphashだけのようなので、ここを見てみることにします。

Readmeの説明

バイトシーケンスに対して、ランダムな文字列を生成する。ハッシュテーブルのようなデータ構造を自分で作る人向けに、任意の文字列やバイト列を、分散した数値に変換する。
ただし、sha256やsha512のような暗号目的には使えないとされています。64ビットでビット数が少ないので。

フォルダ構造

実体は1ファイル。

maphash.go
maphash_test.go
smhasher_test.go

テストを見る(maphash_test.go)

インタフェースのチェックはもう基本テクですね

// Make sure a Hash implements the hash.Hash and hash.Hash64 interfaces. |
var _ hash.Hash = &Hash{}
var _ hash.Hash64 = &Hash{}

仕様を確認

1文字ずつ入れても、3文字入れても結果は同じ

func TestHashGrouping(t *testing.T) {
	b := []byte("foo")
	h1 := new(Hash)
	h2 := new(Hash)
	h2.SetSeed(h1.Seed())
	h1.Write(b)
	for _, x := range b {
		err := h2.WriteByte(x)
		if err != nil {
			t.Fatalf("WriteByte: %v", err)
		}
	}
	if h1.Sum64() != h2.Sum64() {
		t.Errorf("hash of \"foo\" and \"f\",\"o\",\"o\" not identical")
	}
}

fooを[]byteで入れても、文字列で入れても結果は同じ。

func TestHashBytesVsString(t *testing.T) {
	s := "foo"
	b := []byte(s)
	h1 := new(Hash)
	h2 := new(Hash)
	h2.SetSeed(h1.Seed())
	n1, err1 := h1.WriteString(s)
	if n1 != len(s) || err1 != nil {
		t.Fatalf("WriteString(s) = %d, %v, want %d, nil", n1, err1, len(s))
	}
	n2, err2 := h2.Write(b)
	if n2 != len(b) || err2 != nil {
		t.Fatalf("Write(b) = %d, %v, want %d, nil", n2, err2, len(b))
	}
	if h1.Sum64() != h2.Sum64() {
		t.Errorf("hash of string and bytes not identical")
	}
}

テストを見る(smhasher_test.go)

ハッシュ関数に対する拷問テストのツールがあり、それをGoに移植しているとのこと。元はGoogle製で、非暗号的ハッシュ関数の性能（分散や、速度）を計測するらしい。

// Smhasher is a torture test for hash functions.
// https://code.google.com/p/smhasher/
// This code is a port of some of the Smhasher tests to Go.

内容的には、ハッシュの衝突を記録する構造体があって、循環する文字列、密度が薄い（Sparse）文字列、1ビットだけ変更した文字列で半数のビットが書き換わるか・・・といったテストがある模様。

テストコードで見つけた配慮。

	if runtime.GOARCH == "wasm" {
		t.Skip("Too slow on wasm")
	}
	if testing.Short() {
		t.Skip("Skipping in short mode")
	}

ここから実装の方を見ていきます

Hash構造体

type Hash struct {
	_     [0]func() // not comparable
	seed  Seed      // initial seed used for this hash
	state Seed      // current hash of all flushed bytes
	buf   [64]byte  // unflushed byte buffer
	n     int       // number of unflushed bytes
}

比較不能

本当かどうか試してみた。 https://play.golang.org/p/Obem8g4xdXm

確かに比較不能というコンパイルエラー。

./prog.go:16:23: invalid operation:
    a == b (struct containing [0]func() cannot be compared)

固定長配列なら、中の要素が比較可能ならOK
スライスはダメ（nilとの比較のみ）
単体の要素でも関数などは比較不能(nilとの比較のみ)
単体の変数ならnilとの比較のみ可能、という具体的なエラーメッセージになるが、構造体にラップされており、その内部の要素に比較不能なものがあると、理由なく「比較不能」というメッセージになる

ハッシュの計算

Write()、WriteByte()、WriteString()があるが、どれもやっていることは変わらないので、シンプルなWriteByteで見てみる。

バッファ(64バイト)分データが溜まったらflush()メソッドを呼ぶ。その後はバッファにデータを積む。

hash.Hashインタフェースを満たすためのシグニチャになっているだけで、errorを返す宣言になっているが、エラーは絶対に発生しない。

// WriteByte adds b to the sequence of bytes hashed by h.
// It never fails; the error result is for implementing io.ByteWriter.
func (h *Hash) WriteByte(b byte) error {
	if h.n == len(h.buf) {
		h.flush()
	}
	h.buf[h.n] = b
	h.n++
	return nil
}

flush()は、rthashという関数を呼んでいる。溜まったバッファの値と、現在のstateの情報をわたし、現在のstateを更新する。現在のstate情報も参照するため、64バイトごとに同じデータを送っても、結果が循環することはない。

// precondition: buffer is full.
func (h *Hash) flush() {
	if h.n != len(h.buf) {
		panic("maphash: flush of partially full buffer")
	}
	h.initSeed()
	h.state.s = rthash(h.buf[:], h.state.s)
	h.n = 0
}

rthash関数はこんな実装。 unsafe.Sizeof(uintptr(0)) == 8 で64ビットかどうかを判定している模様。32ビットだと、上位ビットと下位ビットを計算してから合成して64ビットにしている。

func rthash(b []byte, seed uint64) uint64 {
	if len(b) == 0 {
		return seed
	}
	// The runtime hasher only works on uintptr. For 64-bit
	// architectures, we use the hasher directly. Otherwise,
	// we use two parallel hashers on the lower and upper 32 bits.
	if unsafe.Sizeof(uintptr(0)) == 8 {
		return uint64(runtime_memhash(unsafe.Pointer(&b[0]), uintptr(seed), uintptr(len(b))))
	}
	lo := runtime_memhash(unsafe.Pointer(&b[0]), uintptr(seed), uintptr(len(b)))
	hi := runtime_memhash(unsafe.Pointer(&b[0]), uintptr(seed>>32), uintptr(len(b)))
	return uint64(hi)<<32 | uint64(lo)
}

runtime_memhashは次のようになっています。実体のない関数定義にlinknameディレクティブ。これはunsafeがついているパッケージのみで利用でき、外部パッケージのプライベートな関数を呼び出すためのスタブ関数を生成してくれるとのこと。これにより、プライベート関数が呼び出せる。

//go:linkname runtime_memhash runtime.memhash
//go:noescape
func runtime_memhash(p unsafe.Pointer, seed, s uintptr) uintptr

memhashの実体はアセンブリ言語で実装されている。各CPUごとの実装がある（memhashFallbackというGo実装もある）。

// func memhash(p unsafe.Pointer, h, s uintptr) uintptr
// hash function using AES hardware instructions
TEXT runtime·memhash(SB),NOSPLIT,$0-32
	CMPB	runtime·useAeshash(SB), $0
	JEQ	noaes
	MOVQ	p+0(FP), AX	// ptr to data
	MOVQ	s+16(FP), CX	// size
	LEAQ	ret+24(FP), DX
	JMP	aeshashbody<>(SB)
noaes:
	JMP	runtime·memhashFallback(SB)

最終的にはAES-NIという、TLS暗号の中で使われる最近のCPUで使われる専用命令でハッシュ計算をしていました。

まとめ

やっていることはシンプルで、mapで使われているハッシュ関数に公開APIがついた程度。完全な新機能を見ていたはずだが・・・

テストとか、中のテクニックはなかなか興味深いものもあったかも

明日からは役に立たない知識を中心にまとめました

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up