Qiita Engineer Festa20242024年7月17日まで開催中！

Google benchmarkでマルチスレッド測定

Posted at 2024-07-09

Google Benchmark

Googleのベンチマークライブラリを使ってみました。

インストールはREADMEのとおりに行いました。

# Check out the library.
$ git clone https://github.com/google/benchmark.git
# Go to the library root directory
$ cd benchmark
# Make a build directory to place the build output.
$ cmake -E make_directory "build"
# Generate build system files with cmake, and download any dependencies.
$ cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../
# or, starting with CMake 3.13, use a simpler form:
# cmake -DCMAKE_BUILD_TYPE=Release -S . -B "build"
# Build the library.
$ cmake --build "build" --config Release

使い方は簡単で、任意のC++コードからライブラリをインクルードして使用できます。

mybenchmark.cc

#include <benchmark/benchmark.h>

static void BM_StringCopy(benchmark::State& state) {
  std::string x = "hello";
  for (auto _ : state)
    std::string copy(x);
}

// ベンチマークのインスタンスを初期化・登録する
BENCHMARK(BM_StringCopy);

BENCHMARK_MAIN();

コードはそのままコンパイルして実行できます。コンパイルではbenchmark/includeとビルドしたライブラリをリンクします。

$ g++ mybenchmark.cc -std=c++11 -isystem benchmark/include \
  -Lbenchmark/build/src -lbenchmark -lpthread -o mybenchmark

これを実行すると実行時間とCPU時間、イテレーション数が表示されます。一応、シングルのときはコアを固定しておきます。

$ taskset -c 0 ./mybenchmark 
:
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
BM_StringCopy       8.43 ns         8.43 ns     82244798

また、--benchmark_formatでcsv, json形式でも出力できます。

$ taskset -c 0 ./mybenchmark --benchmark_format=csv
:
"BM_StringCopy",81924408,8.39236,8.39254,ns,,,,,

イテレーション数は測定回数で、実行時間は各測定の平均値が取られています。イテレーション数は以下のように指定もできるようです。

BENCHMARK(BM_StringCopy)->Iterations(100000000);

マルチスレッド実行

マルチスレッドはThreadRangeを使うと測定できます。これは1~8まで、1, 2, 4, 8と並列数を増やして測定してくれます。

#include <benchmark/benchmark.h>

int total = 1 << 28;
static void BM_Loop(benchmark::State& state) {
  int sum = 0;
  for (auto _ : state) {
    int p = 0;
    for (int i = 0; i < total; ++i) {
      p++;
    }
    sum += p;
  }
}

BENCHMARK(BM_Loop)->ThreadRange(1, 8);

BENCHMARK_MAIN();

結果は以下のようになりました。

Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_Loop/threads:1  422985199 ns    422978180 ns            2
BM_Loop/threads:2  215144377 ns    430289045 ns            2
BM_Loop/threads:4  109576930 ns    438312587 ns            4
BM_Loop/threads:8   55103040 ns    438397040 ns            8

内部(BenchmarkRunner)で自動的にスレッドを割り当てているようです。ユーザーガイドによると以下らしいです。（このあたりで排他制御している）

すべてのスレッドがベンチマークループの開始点に到達するまで、どのスレッドも開始せず、どのスレッドもベンチマークループを抜ける前に終了することが保証されている。

環境

8コアのCPUを使いました。スレッド数は計8スレッドです。

  Model name:           Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
    Thread(s) per core: 1
    Core(s) per socket: 8
    Socket(s):          1

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up