組み込み関数append()のパフォーマンス

Last updated at 2015-01-31Posted at 2014-01-15

この記事はもう古い

goのバージョンアップでこの辺の動きは変わっているようです。
以下のエントリで解説されています。
https://qiita.com/tanaton/items/e0bbd83e9b7158cd19c9

以下はもう正確じゃないかもしれません

@yosida95と@KOBA789とgolangでのslice実装について話していた。
これはそのメモと整理。

先に結論

sliceを作るときは、make()する時点で十分な領域を確保したほうが良い。
appendのメモリ確保はそんなに賢くなく、呼び出しの度に必ず確保を行うため、パフォーマンスは悪い。
以下の簡単なベンチでは約4倍のパフォーマンス差がある。

faster.go

package main

const SIZE = 1e8
func main() {
	s := make( []int, 0, SIZE )
	for i:=0; i<SIZE; i++ {
		s = append( s, 1 )
	}
}

faster.goを実行

umisama-ubuntu{umisama}% time ./faster
./faster  0.36s user 0.22s system 98% cpu 0.586 total

slower.go

package main

const SIZE = 1e8
func main() {
	s := make( []int, 0, 0 )
	for i:=0; i<SIZE; i++ {
		s = append( s, 1 )
	}
}

slower.goを実行

umisama-ubuntu{umisama}% time ./slower
./slower  1.14s user 1.01s system 99% cpu 2.152 total

経緯

コンパイラの実装を読んで、どういう動きをしているのか確かめている。
goのスライスは、runtime.hで以下のように定義されている。

src/pkg/runtime/runtime.h

struct	Slice
{				// must not move anything
	byte*	array;		// actual data
	uintgo	len;		// number of elements
	uintgo	cap;		// allocated number of elements
};

スライスがmake()されるとき、その第二引数capの量のメモリを確保される。

src/okg/runtime/slice.c

static void
makeslice1(SliceType *t, intgo len, intgo cap, Slice *ret)
{
	ret->len = len;
	ret->cap = cap;
	ret->array = runtime·cnewarray(t->elem, cap);
}

ここまでは良い、まぁ予想通り。
で、append()はどう実装されているかというと、こんな感じ(コメントに付いている擬似コード、Cは省略)

src/cmd/gc/walk.c

// expand append(l1, l2...) to
//   init {
//     s := l1
//     if n := len(l1) + len(l2) - cap(s); n > 0 {
//       s = growslice(s, n)
//     }
//     s = s[:len(l1)+len(l2)]
//     memmove(&s[len(l1)], &l2[0], len(l2)*sizeof(T))
//   }

呼ばれるgrowsliceへは、必要な長さのみが指示される。
growsliceは毎回、先のmakeslice1()で全く新しくメモリ領域を確保し、そこへ内容をコピーする。（一部略）

src/pkg/runtime/slice.c

void
runtime·growslice(SliceType *t, Slice old, int64 n, Slice ret)
{
	int64 cap;
	cap = old.cap + n;

	//略

	growslice1(t, old, cap, &ret);

	//略
}

static void
growslice1(SliceType *t, Slice x, intgo newcap, Slice *ret)
{
	intgo m;

	m = x.cap;
	
	// Using newcap directly for m+m < newcap handles
	// both the case where m == 0 and also the case where
	// m+m/4 wraps around, in which case the loop
	// below might never terminate.
	if(m+m < newcap)
		m = newcap;
	else {
		do {
			if(x.len < 1024)
				m += m;
			else
				m += m/4;
		} while(m < newcap);
	}
	makeslice1(t, x.len, m, ret);
	runtime·memmove(ret->array, x.array, ret->len * t->elem->size);
}

最初のmake()の契機以降で呼ばれたappend()は、毎回必ずメモリの再確保と、内容のフルコピーを行うようだ。

所感

多めに獲得してくれると、毎回呼ばれなくてパフォーマンスが上がりそうな気がした。
そういうappend()的関数を自作するのも良いかもしれない。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up