1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

COCO形式のDatasetの作り方と、正しく作れているかの確認(視覚化)の仕方

Last updated at Posted at 2023-02-13

自分のデータでCOCO形式のデータセットを正しく作るの、本当にこれであっているのかなあ、と不安になりながらやっていたので、
これでOKだよ、というのをメモ。
ちなみにObject Detectionタスクだけに使うことを念頭に置いています。
Segmentationタスクはやったことない。

Object Detection, Segmentationのデータの作り方

こちらの記事のコードを少しだけ改変

import json
import collections as cl
import cv2

def get_info():
  tmp = cl.OrderedDict()
  tmp["description"] = "my own dataset"
  tmp["url"] = ""
  tmp["version"] = "1"
  tmp["year"] = 2023
  tmp["contributor"] = ""
  tmp["data_created"] = "2023/02/09"
  return tmp

def get_licenses():
  tmp = cl.OrderedDict()
  tmp["id"] = 0
  tmp["url"] = ""
  tmp["name"] = ""
  return tmp

def get_image_data(id, image_path, image_file, w, h):
  tmp = cl.OrderedDict()
  tmp["license"] = 0
  tmp["id"] = id
  tmp["file_name"] = image_file
  tmp["width"] = w
  tmp["height"] = h
  tmp["date_captured"] = ""
  tmp["coco_url"] = ""
  tmp["flickr_url"] = ""
  return tmp

def get_annotation(id, image_id, category_id, bbox, segmentation):
  tmp = cl.OrderedDict()

  tmp["segmentation"] = segmentation
  tmp["id"] = id
  tmp["image_id"] = image_id
  tmp["category_id"] = category_id
  tmp["area"] = bbox[2] * bbox[3]
  tmp["iscrowd"] = 0
  tmp["bbox"] =  bbox
  return tmp

def get_categories():
  tmps = []
  sup = ["my super category"]
  categories = ["category0", "category1", "category2"]
  for i in range(len(defect)):
    tmp = cl.OrderedDict()
    tmp["id"] = i
    tmp["supercategory"] = sup
    tmp["name"] = categories[i]
    tmps.append(tmp)
  return tmps

以下は例えば、Yolov5で使う画像ごとのアノテーションテキストファイルをCOCOに変換する例。

image_dir = "my_dataset/images/train/"
label_dir = "my_dataset/labels/train/"
json_path = "my_dataset_train.json"

image_list = os.listdir(image_dir)
print(len(image_list))
info = get_info()
licenses = get_licenses()
categories = get_categories()
images = []
annotations = []

for i, image_file in enumerate(image_list):
  image_file = image_list[i]
  image_path = os.path.join(image_dir,image_file)
  label_file = os.path.splitext(os.path.basename(image_file))[0] + ".txt"
  label_path = os.path.join(label_dir,label_file)
  if os.path.exists(label_path):
    img = cv2.imread(image_path)
    img_h, img_w, _ = img.shape

    image_data = get_image_data(i, image_path, image_file, img_w, img_h)
    source_file = open(label_path)
    for object_index, line in enumerate(source_file):
      staff = line.split()
      class_idx = int(staff[0])

      x_center, y_center, width, height = float(staff[1])*img_w, float(staff[2])*img_h, float(staff[3])*img_w, float(staff[4])*img_h
      x = round(x_center-width/2,2)
      y = round(y_center-height/2,2)
      width = round(width,2)
      height = round(height,2) 
      bbox = [x, y, width, height]
      id = i * 1000 + object_index
      annotation = get_annotation(id, i, class_idx, bbox, [])
      annotations.append(annotation)
      images.append(image_data)

json_data = {
    'info': info,
    'images': images,
    'licenses': licenses,
    'annotations': annotations,
    'categories': categories,
}

with open(json_path, 'w', encoding='utf-8') as f:
    json.dump(json_data, f, ensure_ascii=False)

作ったデータセットを視覚化して確認する。

COCO-Viewerで視覚化できる。

使い方

ローカルにクローン(Colabでは使えないので自分のパソコンに)。

python3 cocoviewer.py -i my_dataset/images/train -a my_dataset_train.json

-i に画像ディレクトリ
-a にjsonファイルを指定する。

img1.png
[COCO-Viewerのリポジトリより画像]

うまく表示されなければ、データセットjsonの作り方がどこか間違っているので、公式のCOCOデータセットをダウンロードして見比べて確認する。

🐣


フリーランスエンジニアです。
お仕事のご相談こちらまで
rockyshikoku@gmail.com

機械学習、ARアプリ(Web/iOS)を作っています。
機械学習/AR関連の情報を発信しています。

Twitter
Medium
GitHub

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?