More than 3 years have passed since last update.

Detectron2：入門　TACOのデータセット

Posted at 2022-01-05

Detectron2とはMeta(Facebooks)が開発した物体を抽出するAIモデルです。
参考 facebookresearch/detectron2

自分でFBよりいいモデルを作れる人が少ないでしょうね。FBの人材の結果を無料に使用できることがいいですね。この記事には、Detectron2の基本を説明し、TACOのゴミの画像のデータセットを利用して、物体を検出するモデルを作成します。すべてのコードはGitHubにアップして、GoogleColabを使える環境を使用しています。そして、Colabで使いたい方の場合は、ノートブックの設定は「CPU」に変更することを覚えてください。GolabでDetectronをインストールすることが5分間ごろの時間がかかるので、最初から正しい設定が助けれます。自分の環境を持ち方は、GitHubのRepoでは環境の「Requirments.txt」をインストールしてください。

コードのリンク

[Notebook](https://github.com/JarvisSan22/JC_Learn_Detectron2/blob/main/Detectron2_TACO.ipynb) [Git Hub repo JC_Learn_Detectron2](https://github.com/JarvisSan22/JC_Learn_Detectron2)

Detectron2

Detectron2はPyTorchベースをして、複数の物体検出のモデルが含まれています。 Bounding box,Keypoint Detection,Instance Segmenation,Panoptic segmentationなどモデルがあり、今回はInstance Segmenationのモデルを使用します。

Detectron2のInstance Segmenationのモデルは複数のモデルがあります。（現在（2022/01/04）9つのモデル）。今回は「mask_rcnn_R_101_FPN_3x」というモデルを使用しています。それは三段構造のモデルです。最初の一段目は「FPN_3x」の部分は、英語で「Feature Pyramid Network」という畳み込みニューラルネットワークを行います。その結果から、特徴のマップ(Feature Map)を取得し、次の段に入りました。それは「RoIAlign Layer」です。RoIという熟は「Regions of interest」で、画像から興味があるアリアを検出します。後でAlignを行い、外れたことをならばせします。その層の結果は、二つの技に分割します。一つ目（下記の画像の左側）は「Mask branck」です。特徴なアリアが色を付けることを行います。英語で「Segmentation map」を作成します。二つ目（下記の画像の左側）は、bbox（特徴な箱の範囲を作ること）とClassification(特徴のラベルの名前を付けること)をともに行います。　

このモデルはMetaの研究者(元のFacebook)から開発して、もし英語で読めれば、ぜひ本論文をご覧ください。Kaiming He 2017

Detectron2 を使用する

TACOのデータセットはオープンソースソフトウェアのゴミ画像とアノテーションのデータセットです。現在 (2022/01/04) 1500枚の画像と4780のアノテーションがあり、二つのカテゴリーのオプションもが含まれています。この記事には、TACOのSuperCategoiresを使用します。それは、28のラベルでは、Can,Paper, Plastic....などのようなラベルを含まれています。

下記のコードは、TACOをセットアップしています。もし他のところに保存したいなら、「--dataset_path {}」の後部分を更新してください。

git clone https://github.com/pedropro/TACO.git
cd TACO
python3 TACO/download.py --dataset_path TACO/data/annotations.json

Dataset:COCOデータセットを準備する

Detectron2はCOCOのデータセット形式を直接に読み込ます。COCOデータセットは、3つ部分の辞書です。"images","annotations","categories"の部分です。"images"と"annotations"は配列で、それぞれの画像とアノテーションの情報は辞書に保存しています。Detectron2を使用するために、最初のステップはデータはこのような形式に変換することです。

{'id': 0, 'width': 1537, 'height': 2049, 'file_name': 'batch_1/000006.jpg'}

annotationの例

{'id': 1,
 'image_id': 0,
 'category_id': 3,
 'segmentation': [[561.0,1238.0,  568.0, ............]],
 'area': 403954.0,
 'bbox': [517.0, 127.0, 447.0, 1322.0],
 'iscrowd': 0,
 'bbox_mode': 1}

cocodata["categories"]の例

[{'name': 'Aluminium foil', 'id': 0},
 {'name': 'Battery', 'id': 1},
 {'name': 'Blister pack', 'id': 2},
 {'name': 'Bottle', 'id': 3},
 {'name': 'Bottle cap', 'id': 4},
 {'name': 'Broken glass', 'id': 5},
 {'name': 'Can', 'id': 6},
 {'name': 'Carton', 'id': 7},
 {'name': 'Cigarette', 'id': 8},
 {'name': 'Cup', 'id': 9},
 {'name': 'Food waste', 'id': 10},
 {'name': 'Glass jar', 'id': 11},
 {'name': 'Lid', 'id': 12},
 {'name': 'Other plastic', 'id': 13},
 {'name': 'Paper', 'id': 14},
 {'name': 'Paper bag', 'id': 15},
 {'name': 'Plastic bag & wrapper', 'id': 16},
 {'name': 'Plastic container', 'id': 17},
 {'name': 'Plastic glooves', 'id': 18},
 {'name': 'Plastic utensils', 'id': 19},
 {'name': 'Pop tab', 'id': 20},
 {'name': 'Rope & strings', 'id': 21},
 {'name': 'Scrap metal', 'id': 22},
 {'name': 'Shoe', 'id': 23},
 {'name': 'Squeezable tube', 'id': 24},
 {'name': 'Straw', 'id': 25},
 {'name': 'Styrofoam piece', 'id': 26},
 {'name': 'Unlabeled litter', 'id': 27}]

TACOのデータセットは標準に似ている形式に保存しているが、Annotationsはbbox_modeがなくて、Catergoriesもを更新しなければなりません。そして、私の経験から複数の同じのアノテーションがあるので、その点を削除しなければなりません。TACOのデータセットをPythonで読み込み、準備しましょう。

imports

# import some common libraries
import numpy as np
import os
from tqdm import tqdm
import os, json, cv2, random
import pandas as pd
import matplotlib.pyplot as plt
import copy
plt.rcParams.update({'font.size': 22})

データを読み込み

dataset_path = './TACO/data'
anns_file_path = dataset_path + '/' + 'annotations.json'

# Read annotations
with open(anns_file_path, 'r') as f:
    dataset = json.loads(f.read())

categories = dataset['categories']
anns = dataset['annotations']
imgs = dataset['images']

Pandasに変換する

img_df=pd.DataFrame(imgs)
ang_df=pd.DataFrame(anns)
cat_df=pd.DataFrame(categories)
print('Number of super categories:',len(cat_df.supercategory.unique()))
print('Number of categories:',len(cat_df.name.unique()))
print('Number of annotations:', len(ang_df))
print('Number of images:', len(img_df))

img_df.head()

複数のアノテーションを削除


val,counts=np.unique(ang_df.id,return_counts=True)
val[counts>1] #array([ 309, 4040])
ang_df=ang_df[~ang_df.id.duplicated(keep="first")]

画像とアノテーションを表示する (ipywidgetsを使用して、それは、プロットを操作するUIです。もっと詳しく知りたい方は、この記事を読んでください)


import colorsys
from ipywidgets import interact  

def get_optimal_font_scale(text, width): 
    #Source https://stackoverflow.com/questions/52846474/how-to-resize-text-for-cv2-puttext-according-to-the-image-size-in-opencv-python

    for scale in reversed(range(0, 60, 1)):
      textSize = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_PLAIN, fontScale=scale/10, thickness=1)
      new_width = textSize[0][0]
      #print(new_width)
      if (new_width <= width):
          return scale/10
    return 1



def CV2_SegBBoxClassImshow(image,n_bbox,n_segs,n_classes,ax=None,bbox_type="XYWH",figsize=(10,10)):
    #Add to subplot or create new plot 
    if not ax:
        fig,ax=plt.subplots(1,1,figsize=figsize)
    #copy image, make sure its an array 
    new_image=np.array(image)
 
    
    #Add bbox and segmentations in look  
    for box,seg,cat in zip(n_bbox,n_segs,n_classes):
      
        c=colorsys.hsv_to_rgb(np.random.random(),1,255)
        
        #print(c)
        # = ( int (c [ 0 ]), int (c[ 1 ]), int (c [ 2 ])) 
        box=np.array(box,dtype=np.int32)
        #bbox type XXWH
        if bbox_type=="XYWH": #(x0,y0,w,h)
            new_image=cv2.rectangle(new_image,
                        (box[0],box[1]),
                        (box[0]+box[2],box[1]+box[3]),
                        c,
                        thickness=5
                        )
        else: #XYXY
            new_image=cv2.rectangle(new_image,
                         (box[0],box[1]),
                        (box[2],box[3]),
                        c,
                        thickness=5
                        )
       
        pl=len(seg)
        #ALpha effect for fillpoly 
        seg=np.array(seg,dtype=np.uint64).reshape(1, -1, 2)
        overlay=new_image.copy()
        cv2.fillPoly(overlay,seg,c) #Add segmentation
        
        scale=get_optimal_font_scale(cat, box[2]) #Text scaling to BBox width 
        cv2.putText(new_image,cat, (box[0],box[1]),cv2.FONT_HERSHEY_PLAIN, scale,c,cv2.LINE_AA)


        new_image=cv2.addWeighted(overlay,0.4,new_image,1-0.4,0)
    
        
    #Show image 
    ax.imshow(new_image[:,:,::-1])
    ax.set_title(f"image {new_image.shape}, BBoxs {len(n_bbox)}")
    
   

    
  
data_dic=dataset
# Interactive plot by images id 
@interact(image_idx=(0, len(data_dic["images"])))
def DataDict_intract_view(image_idx):
    img_d=data_dic["images"][image_idx]
    id=img_d["id"]
    
    img=cv2.imread(dataset_path+"/"+img_d["file_name"]) #Read image 

    #Get annotations 
    annot=list([ a for a in data_dic["annotations"] if a["image_id"]==id ])
    #segs=np.array(list([a["segmentation"] for a in annot]))
    n_bbox=list([a["bbox"] for a in annot])
    n_segs=list([a['segmentation'] for a in annot])
    #n_classes=list([a['category_id'] for a in annot])
    #Class names
    n_classes=[ cat_df.iloc[annot[i]["category_id"]].supercategory for i in range(0, len(annot))]
    #print(n_bbox)
    CV2_SegBBoxClassImshow(img,n_bbox,n_segs,n_classes,bbox_type="XYWH")

データを処理して、表示した後で、Detectron2に読み込める形式に変換しなければなりません。最初に、ラベルの名は「SuperCatergorys」に変更します。次、アノテーションでは、bboxtpyeを加えます。そのあとで、データをJsonの形式に保存します。最後にCOCOのデータセットをセットアップします。

ラベルの名を変更


# Super catergoires
ang_df["super_category"]=ang_df["category_id"].apply(lambda value: categories[value]["supercategory"] )
super_categorys=[]
ang_df["super_category_id"]=0
for i,cat in enumerate(sorted(ang_df["super_category"].unique())):
    super_categorys.append({"name":cat,"id":i})
    ang_df.loc[ang_df["super_category"]==cat,"super_category_id"]=i
ang_df["category_id"]=ang_df["super_category_id"]

BBox type


# BBOx mode
from detectron2.structures import BoxMode

ang_df["bbox_mode"]=BoxMode.XYWH_ABS


cocodata={"images":img_df[["id","width","height","file_name"]].to_dict("records"),#Cut index 
         "annotations":ang_df[['id', 'image_id', 'category_id', 'segmentation', 'area', 'bbox',
       'iscrowd', 'bbox_mode']].to_dict("records"),
          "categories":super_categorys, "info":dataset["info"]}

Jsonの形式に保存して、訓練と評価のデータセットに分ける

# Save datadata as json
save_name="TACO_annotations.json"
json.dump(cocodata,open(save_name,"w"))


# Train eval split
img_id=list(range(1,len(cocodata["images"])+1))
np.random.seed=42
import random


def SplitDatadir(data_dict,img_ids):
    newdata_dict=copy.deepcopy(data_dict) #.copy()
    print("start",len(newdata_dict["images"]),len(newdata_dict["annotations"]))
    #newdata_dict["images"]={}
    #newdata_dict["annotations"]={}
    newdata_dict["images"]=list(img for img in data_dict["images"] if img["id"] in img_ids)
    
    newdata_dict["annotations"]=list(ann for ann in data_dict["annotations"] if ann['image_id'] in img_ids)
    
    print("finish",len(newdata_dict["images"]),len(newdata_dict["annotations"]))
    return newdata_dict

# Split data set by 80% train, 20% eval
train_id=random.sample(img_id,int((len(img_id))*0.8))
val_id=list([id for id in img_id if id not in train_id])
print(len(train_id),len(val_id))

train_dataset=SplitDatadir(cocodata,train_id)
json.dump(train_dataset,open(save_name.replace(".","_train."),"w"))
val_dataset=SplitDatadir(cocodata,val_id)
json.dump(val_dataset,open(save_name.replace(".","_val."),"w"))

COCOのデータセットをセットアップする

image_root=dataset_path
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
label_paths={
   "train":save_name.replace(".","_train."),
   "val":save_name.replace(".","_val."),
    "all":save_name  #label_path,
}
for d in list(label_paths.keys()):
  dataset_name=f"TACO_{d}"
  if dataset_name in DatasetCatalog.list():
    #DatasetCatalog.remove(dataset_name)
    DatasetCatalog.remove(dataset_name)
  register_coco_instances(dataset_name, {},label_paths[d], image_root)

Model:モデルを作成し、訓練

Detectron2のモデルをセットアップすることは簡単です。Detectron2のModel Zooでは沢山なPre-Trainedモデルがあり、希望のモデルとハイパーパラメータを設定し、訓練を行われます。


from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo
from datetime import datetime

cfg = get_cfg()
model="COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
cfg.merge_from_file(model_zoo.get_config_file(f"{model}")) #Get base model
cfg.DATASETS.TRAIN = ("TACO_train",)
cfg.DATASETS.TEST = ("TACO_val",)
cfg.DATALOADER.NUM_WORKERS = 0
cfg.OUTPUT_DIR=f"./output/{model}"
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)   

cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(model) #weighting 
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256　　#RoI propsals per image 
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(super_categorys) #Category number 

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 #Test threshold 
cfg.SOLVER.MAX_ITER =500 #Epochs

# Learn rate settings 
cfg.SOLVER.LR_SCHEDULER_NAME= "WarmupMultiStepLR"  # Learning  WarmupMultiStepLR (default) or WarmupCosineLR
cfg.SOLVER.STEPS = [] #Reduction [ ] ~ constant LR
cfg.SOLVER.BASE_LR =0.001 
cfg.SOLVER.IMS_PER_BATCH =  2 #Batch size

訓練を行うために、「DefaultTrainer」という関数では、モデルを入り込んで、訓練を行います。（自分のカステムのTrainerを作成できます。この記事にそれを説明しませんが、もっと知りたい方は、Detectronの詳細をみてください　リンク )


start=datetime.today()
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
end=datetime.today()
modelruntime=end-start   

print(f"{'X'*20}")
print(f"{'X'*20}")
print(f"{'X'*5} runtime {'X'*6}")
print(f"{'X'*2} {modelruntime.__str__()} {'X'*2}")
print(f"{'X'*20}")
print(f"{'X'*20}")

Detectron2のDefaultTrainerでは、訓練の結果を評価しません。訓練の後で、自分でCOCOEvaluatorを定義して、結果の評価を行えます。入力はデータの辞書の形式に保存し、私はPandasのデータセットに変更して、結果を表示しました。


from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

eval_name="TACO_val"
evaluator=COCOEvaluator(eval_name, ("bbox","segm"), 
                        False, output_dir=cfg.OUTPUT_DIR)
val_loader = build_detection_test_loader(cfg,eval_name)

evalation=inference_on_dataset(trainer.model, val_loader, evaluator)

# Pandas plot 
df_evalation=pd.DataFrame(evalation)
df_evalation=df_evalation.drop(["APs","APm","APl"],axis=0)
df_evalation.to_csv(cfg.OUTPUT_DIR+"/evaluation.csv")

fig,ax=plt.subplots(1,1,figsize=(10,10))
df_evalation.plot.bar(rot=90,ax=ax)
ax.grid()
ax.set_ylabel("Average Percision (AP)")
fig.savefig(cfg.OUTPUT_DIR+"/AveragePercion_plot.jpg")

結果によりと、すべてのラベルが予想できないそうです。そのせいで、平均のAP (Average Percision)は低くならせるでしょう。そして、BottleとCUPとPlatic-bagのラベルの結果だけを表示しています。その結果の原因は、二つの点だと思います。一つ目はラベルの数のバイアスです。二つ目は小さい物体を検出する問題です。

ラベルの数のバイアスを調査するために、ラベルの数をプロットできます。下記の結果から、ラベルの数は等しい数ではないので、


vals,counts=np.unique(ang_df["super_category"],return_counts=True)
cat_count_df=pd.DataFrame(data={"val":vals,"counts":counts})
cat_count_df=cat_count_df.sort_values("counts")
cat_count_df.plot.barh(x="val",figsize=(10,10))

小さい物体を検出する問題は、よくある問題です。大きな物体を見えやすいですが、モデルは小さい物体が画像の背景だと思うようになるので、検出しにくくなります。ラベル数が多いアリアとラベル数が少ないアリアをプロットすれば、その点を示しています。

ラベル数は百個以下のアリア

ラベル数は百個以上のアリア

プロットから結論は見えにくいですが、頻度(Freqnecy)の軸からモデルから結果を出たラベルは頻度が高くて、短いアリアだけじゃないです。
ラベル数は百個以上のアリアのプロットは短いアリア以上のアリアの頻度は、ラベル数は百個以下のアリアのプロットより高いです。それと等しくないラベルの数は大きなバイアスを引き起こされます。

この記事には、この問題を解決しませんが、次の記事には、改善する方法を示しております。作成する際、リンクは下記にあります。

Predict:画僧を予想する

モデルを訓練した後で、結果を利用したいですね。画像の物体を予想しましょう。
DefaultTrainerのように、Detctron2では、DefaultPredicterがあり、それには、モデルを入り込みます。

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

# create config
cfg = get_cfg()
model=Models_Segmentation[7]
cfg.merge_from_file(model_zoo.get_config_file(f"{model}.yaml"))

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
cfg.MODEL.WEIGHTS =  os.path.join("output",model, "model_final.pth")  #
cfg.MODEL.DEVICE = "cpu" # we use a CPU Detectron copy
cfg.MODEL.ROI_HEADS.NUM_CLASSES =28 #num_of_classes   
predictor = DefaultPredictor(cfg)
print("Predictor has been initialized.")

「predictor」は画像を読み込み、予想の結果を入力します。

ran_img=np.random.choice(imgs)  #Random image test 
img=cv2.imread(dataset_path +"/"+ran_img["file_name"])
pred_result = predictor(img)
pred_result

Detectron2は「Visualizer」という予想の結果を表示する関数があり、今回はそれを使用し、結果を表示します。

from detectron2.utils.visualizer import Visualizer
v = Visualizer(img)
out = v.draw_instance_predictions(pred_result['instances'])
plt.figure(figsize=(15,15))
fig=plt.imshow(out.get_image()[:, :, ::-1])

すべての画像の結果を表示したい方は、下記のコードはipywidgetsを利用し、画像のIDをスクロールしながら、予想します。

from ipywidgets import interact  
@interact(image_idx=(0, len(imgs)))
def DataDict_intract_view(image_idx):
    img_dir=imgs[image_idx]
    img=cv2.imread(dataset_path +"/"+img_dir["file_name"])
    pred_result = predictor(img)
    v = Visualizer(img)
    out = v.draw_instance_predictions(pred_result['instances'])
    plt.figure(figsize=(15,15))
    fig=plt.imshow(out.get_image()[:, :, ::-1])

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Detectron2：入門 TACOのデータセット