More than 5 years have passed since last update.

わかりにくかったバグとの戦いの記録

Posted at 2019-07-18

現象

まあまずはソースコード。

c++17

// clang++ -std=c++17 -O3 -Wall -lpthread && ./a.out 10000

# include <chrono>
# include <condition_variable>
# include <iostream>
# include <mutex>
# include <thread>

using namespace std;
using namespace std::chrono_literals;

class foo {
  thread th_;
  mutex mutex_;
  condition_variable cv_;

public:
  void notify() { cv_.notify_all(); }
  foo()
      : th_([this]() {
          unique_lock<mutex> lock(mutex_);
          if (cv_status::no_timeout == cv_.wait_for(lock, 100us)) {
            cout << "received\n";
            return;
          }
          cout << "time out\n";
        }) {}
  ~foo() { stop(); }
  void stop() {
    if (th_.joinable()) {
      th_.join();
    }
  }
};

int main(int argc, char const *argv[]) {
  for (int i = 0; i < (argc<2 ? 10 : atoi(argv[1])); ++i) {
    std::cout << i << " ";
    foo f;
    this_thread::yield();
    f.notify();
    this_thread::sleep_for(200us);
  }
}

こうして肝のところを切り出すとわかりやすいような気もするけど、気にせずストーリーを説明すると。
やりたいことは：

foo というクラスのインスタンスがスレッドを持っている。
スレッドはコンストラクタでスタートする。
メインスレッドから notify されると、なんか処理をして終了する。
時間までに notify が来なかったらタイムアウトして終了する。

というもの。

それらしいテストも書いて、テストはバッチリOK。
と思ったら、千回か1万回に1回ぐらい、死ぬ。上記のコードで argv[1] を 1万にするとだいたい途中で死ぬ。

死に方は例えばこんな

4155 time out
4156 time out
4157 time out
4158 time out
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
4159 Abort trap: 6

感じ。
ちなみに手元ではなぜか g++-9 だと死なない。
コンパイラによって死んだり死ななかったりするので、コンパイラのバグを疑ったりもした。

原因

よく訓練された C++ ユーザーなら一目見てわかるかもしれないが、原因は、メンバの順序であった。

c++17

  thread th_;
  mutex mutex_;
  condition_variable cv_;

という順序になっているので、まず、th_ のコンストラクタが呼ばれる。
コンストラクタでスレッドが始まる。
スレッドが始まったところで th_ のコンストラクタは完了なので、次に mutex_, cv_ を構築する。
構築しつつ、 th_ のスレッドは走る。
すると、 th_ のスレッドで unique_lock<mutex> lock(mutex_); や cv_.wait_for(lock, 100us) と出会う。
出会った時点で cv_ や mutex_ の構築が終わっている場合が多いんだけど、稀に、th_ のスレッドのほうが早いことがある。
そうすると、 lock や wait_for はコンストラクタによる初期化が終わっていない cv_ や mutex_ を参照することになり、死ぬ。

というわけだと思う。

対策

順序が悪いのが原因なので、順序を正しくすれば良い。
th_ のスレッドが走り始める時点で他のメンバの初期化が終わっていれば良いので、例えば

c++17

  mutex mutex_;
  condition_variable cv_;
  thread th_;

という順序にしておけば問題は発生しない。

教訓

メンバ変数のコンストラクタが呼ばれる順序に気をつけよう。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up