More than 5 years have passed since last update.

YOLOの複数アノテーションの学習を修正した

Posted at 2018-09-07

YOLOの学習

まずYOLOとは、公式　にある通り、You only look onceの略称であり、画像からObjectを検出する深層学習の手法のことである。

YOLOの学習は
./darknet detector train 【DATAファイル】【CFGファイル】【初期weights】
で学習ができる。

学習する際に必要なのは学習対象の画像とオブジェクトのBBoxファイル(text)とオブジェクトのクラスファイルが必要となる。

BBoxは
【class_id】【center_x】【center_y】【rect_w】【rect_h】
のスペース区切りで定義する必要がある。
注意すべき点は画像に対する位置ではなく、割合であること。

BBoxはオブジェクトを囲う矩形から算出する。
矩形座標 (left_w, low_h)、(right_w, high_h)

center_x_point = (right_w - left_w)/2
center_y_point = (high_h - low_h)/2
center_x = center_x_point/image_width
center_y = center_y_point/image_height

rect_w = (right_w - left_w)/image_width
rect_h = (high_h - low_h)/image_height

もちろんBBoxは一つのファイルに複数定義することが可能。

複数アノテーションを定義してハマった

前提：環境はNvidia GeForce 1080Tiを二枚のUbuntu16.4

一つの画像に４〜５個ほどの同一クラスでBBoxを定義し、学習させたところ、うまく認識できない箇所がでてきた。
学習の回数を増やしても一向に学習できなかった。
試しにBBoxの定義順を入れ替えてみたところ、学習できるようになったが、反対に学習できていたところができなくなっていた。

ということでデバッグをしてみた。

Memory周りのバグ？

YOLOことdarknetのソースはC言語で書かれている。
あまり考えたくないが、Memory管理系のバグの可能性を考えた。
mallocやcallocで定義したサイズ不足がないかを見てみた。
特段問題はなさそうに見えたのでここではなさそうだった。

ソースのバグ？

学習させる際に呼び出しているクラスdetectorはexamples/detector.cで定義されており、ここをデバッグしてみると、
どうやらマルチスレッドで動いているようだった。
そこで次に考えられるのはC言語に置けるスレッドセーフな呼び方をしているかどうか。
C言語にはスレッドセーフではない関数が多数ある。
ということで標準関数を一つずつスレッドセーフか否かを調べてみた。

見つけた。
fscanf
この関数はスレッドセーフではないそうだ。

試しにfopen、fgetで取得した行数とfscanfで取得した行数を比べてみた。
4行のファイルが前者だとちゃんと4とカウントされていたが後者では2とカウントされていた。

ということで学習周りでfscanfを使っている箇所をfgetに全て置き換えることにした。
実際のソース(src/data.c line:139~167)

修正前

  box_label *read_boxes(char *filename, int *n)
  {
      FILE *file = fopen(filename, "r");
      if(!file) file_error(filename);
      float x, y, h, w;
      int id;
      int count = 0;
      int size = 64;
      box_label *boxes = calloc(size, sizeof(box_label));
      while(fscanf(file, "%d %f %f %f %f", &id, &x, &y, &w, &h) == 5){ ←修正箇所
          if(count == size) {
              size = size * 2;
              boxes = realloc(boxes, size*sizeof(box_label));
          }
          boxes[count].id = id;
          boxes[count].x = x;
          boxes[count].y = y;
          boxes[count].h = h;
          boxes[count].w = w;
          boxes[count].left   = x - w/2;
          boxes[count].right  = x + w/2;
          boxes[count].top    = y - h/2;
          boxes[count].bottom = y + h/2;
          ++count;
      }
      fclose(file);
      *n = count;
      return boxes;
 }

修正後

  box_label *read_boxes(char *filename, int *n)
  {
      FILE *file = fopen(filename, "r");
      if(!file) file_error(filename);
      float x, y, h, w;
      int id;
      int count = 0;
      int size = 64;
      box_label *boxes = calloc(size, sizeof(box_label));

      char buf[1024];
      // fscanfはスレッドセーフではない
//      while(fscanf(file, "%d %f %f %f %f", &id, &x, &y, &w, &h) == 5){
      // fgetで一行取得
      while(NULL != fgets(buf, 1024,file)) {
          if(count == size) {
              size = size * 2;
              boxes = realloc(boxes, size*sizeof(box_label));
          }
          // bufにある文字列を読み取る
          // sscanfはスレッドセーフな関数
          sscanf(buf, "%d %f %f %f %f", &id, &x, &y, &w, &h);
          boxes[count].id = id;
          boxes[count].x = x;
          boxes[count].y = y;
          boxes[count].h = h;
          boxes[count].w = w;
          boxes[count].left   = x - w/2;
          boxes[count].right  = x + w/2;
          boxes[count].top    = y - h/2;
          boxes[count].bottom = y + h/2;
          ++count;
      }
      fclose(file);
      *n = count;
      return boxes;
 }

その他にも同じくsrc/data.cで
fill_truth_iseg
fill_truth_mask
load_data_compare
load_tags_paths
get_segmentation_image
get_segmentation_image2
も修正。

修正後makeし直し、再度学習させてみた。
学習時間は若干かかったものの正しく学習できたようだ。

以上。

参考文献

fscanfについて
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.bpxbd00/fscanf.htm

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up