はじめに
yolov8のインストールメモ
必要なもの(2023年4月基準)
CUDA==11.8
CuDNN==8.8
- CUDA システム環境変数にCUDA_PATHとCUDA_PATH_V11_8を確認(自動)
- CuDNN ユーザ環境変数のPathに CUDA\V11.8\bin 及び CUDA\V11.8\libnvvpを登録(手動)
- CUDAの確認は、c:>nvcc --versionを利用
インストール手順
- Yolo v8設置
pip install ultralytics
2. GPUに対応してなかった場合
2.1. PytorchだけUninstall
pip uninstall torch torchvision
2.2. Pytorchを再インストール
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
3. YoloのBasic
object_detection.py
from ultralytics import YOLO
import cv2
# Load a model
model = YOLO("yolov8n.pt")
print(model.names)
print(len(model.names))
出力
coco datasetの訓練結果
{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
80
object_detection.py
from ultralytics import YOLO
import cv2
# Load a model
model = YOLO("yolov8n.pt")
#Predict the model
results = model.predict(source='../images/bus.jpg')
#resultsの中身
print(len(results))
print(results[0])
出力
1
ultralytics.yolo.engine.results.Results object with attributes:
boxes: ultralytics.yolo.engine.results.Boxes object
keypoints: None
keys: ['boxes']
masks: None
names: {0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}
orig_img: array([[[122, 148, 172],
[120, 146, 170],
[125, 153, 177],
...,
[157, 170, 184],
[158, 171, 185],
[158, 171, 185]],
...,
[ 99, 89, 95],
[ 96, 86, 92],
[102, 92, 98]]], dtype=uint8)
orig_shape: (1080, 810)
path: 'C:\\Users\\.....\\01.Object_Detection\\..\\images\\bus.jpg'
probs: None
save_dir: None
speed: {'preprocess': 2.521991729736328, 'inference': 116.05429649353027, 'postprocess': 4.006147384643555}
image 1/1 C:\Users\.....\01.Object_Detection\..\images\bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 116.1ms
Speed: 2.5ms preprocess, 116.1ms inference, 4.0ms postprocess per image at shape (1, 3, 640, 480)
- boxes : A 2D tensor of bounding box coordinates for each detection.
- keypoints : A list of detected keypoints for each object.* これは姿勢検定の時に出力
- masks:A 3D tensor of detection masks, where each mask is a binary image.
- names:A dictionary of class names.
- orig_img:The original image as a numpy array.
- orig_shape:The original image shape in (height, width) format.
- path:The path to the image file.
- probs:A Probs object containing probabilities of each class for classification task.
- speed:A dictionary of preprocess, inference and postprocess speeds in milliseconds per image.
ここで、boxesの中身を確認しましょう。
from ultralytics import YOLO
import cv2
# Load a model
model = YOLO("yolov8n.pt")
#Predict the model
results = model.predict(source='../images/bus.jpg')
for result in results:
print(result.boxes)
WARNING 'Boxes.boxes' is deprecated. Use 'Boxes.data' instead.
ultralytics.yolo.engine.results.Boxes object with attributes:
boxes: tensor([[1.7254e+01, 2.3059e+02, 8.0153e+02, 7.6847e+02, 8.7038e-01, 5.0000e+00],
[4.8737e+01, 3.9927e+02, 2.4450e+02, 9.0251e+02, 8.6907e-01, 0.0000e+00],
[6.7027e+02, 3.8027e+02, 8.0986e+02, 8.7569e+02, 8.5361e-01, 0.0000e+00],
[2.2139e+02, 4.0579e+02, 3.4472e+02, 8.5740e+02, 8.1945e-01, 0.0000e+00],
[6.3884e-02, 2.5464e+02, 3.2290e+01, 3.2504e+02, 3.4639e-01, 1.1000e+01],
[0.0000e+00, 5.5101e+02, 6.7097e+01, 8.7394e+02, 3.0120e-01, 0.0000e+00]], device='cuda:0')
cls: tensor([ 5., 0., 0., 0., 11., 0.], device='cuda:0')
conf: tensor([0.8704, 0.8691, 0.8536, 0.8194, 0.3464, 0.3012], device='cuda:0')
data: tensor([[1.7254e+01, 2.3059e+02, 8.0153e+02, 7.6847e+02, 8.7038e-01, 5.0000e+00],
[4.8737e+01, 3.9927e+02, 2.4450e+02, 9.0251e+02, 8.6907e-01, 0.0000e+00],
[6.7027e+02, 3.8027e+02, 8.0986e+02, 8.7569e+02, 8.5361e-01, 0.0000e+00],
[2.2139e+02, 4.0579e+02, 3.4472e+02, 8.5740e+02, 8.1945e-01, 0.0000e+00],
[6.3884e-02, 2.5464e+02, 3.2290e+01, 3.2504e+02, 3.4639e-01, 1.1000e+01],
[0.0000e+00, 5.5101e+02, 6.7097e+01, 8.7394e+02, 3.0120e-01, 0.0000e+00]], device='cuda:0')
id: None
is_track: False
orig_shape: (1080, 810)
shape: torch.Size([6, 6])
xywh: tensor([[409.3930, 499.5262, 784.2786, 537.8801],
[146.6187, 650.8860, 195.7628, 503.2408],
[740.0625, 627.9819, 139.5904, 495.4198],
[283.0580, 631.5974, 123.3267, 451.6083],
[ 16.1772, 289.8407, 32.2266, 70.3963],
[ 33.5483, 712.4762, 67.0965, 322.9335]], device='cuda:0')
xywhn: tensor([[0.5054, 0.4625, 0.9682, 0.4980],
[0.1810, 0.6027, 0.2417, 0.4660],
[0.9137, 0.5815, 0.1723, 0.4587],
[0.3495, 0.5848, 0.1523, 0.4182],
[0.0200, 0.2684, 0.0398, 0.0652],
[0.0414, 0.6597, 0.0828, 0.2990]], device='cuda:0')
xyxy: tensor([[1.7254e+01, 2.3059e+02, 8.0153e+02, 7.6847e+02],
[4.8737e+01, 3.9927e+02, 2.4450e+02, 9.0251e+02],
[6.7027e+02, 3.8027e+02, 8.0986e+02, 8.7569e+02],
[2.2139e+02, 4.0579e+02, 3.4472e+02, 8.5740e+02],
[6.3884e-02, 2.5464e+02, 3.2290e+01, 3.2504e+02],
[0.0000e+00, 5.5101e+02, 6.7097e+01, 8.7394e+02]], device='cuda:0')
xyxyn: tensor([[2.1301e-02, 2.1351e-01, 9.8955e-01, 7.1154e-01],
[6.0169e-02, 3.6969e-01, 3.0185e-01, 8.3565e-01],
[8.2749e-01, 3.5210e-01, 9.9982e-01, 8.1083e-01],
[2.7333e-01, 3.7573e-01, 4.2558e-01, 7.9389e-01],
[7.8869e-05, 2.3578e-01, 3.9865e-02, 3.0096e-01],
[0.0000e+00, 5.1019e-01, 8.2835e-02, 8.0921e-01]], device='cuda:0')
- boxes : Return the raw bboxes tensor (今後 dataに変更。deprecated)
- cls: the class values of the boxes
- conf:the confidence values of the boxes
- id : the track IDs of the boxes (if available).
- xywh:the boxes in xywh format.
- wywhn:the boxes in xywh format normalized by original image size.
- xyxy:the boxes in xyxy format.
- xyxyn:the boxes in xyxy format normalized by original image size.
Object Detection with webcam
object detection webcam.py
from ultralytics import YOLO
import cv2
import math
# https://docs.ultralytics.com/modes/predict/#plotting-results
# cap = cv2.VideoCapture('../videos/motorbikes.mp4')
cap = cv2.VideoCapture(0)
# Load a model
model = YOLO("yolov8l.pt")
while True:
success, img = cap.read()
if success:
# results = model(img) # stream true
results = model.predict(img)
# https: // docs.ultralytics.com / modes / predict / # inference-arguments
# Visualize the results on the frame
annotated_frame = results[0].plot()
# Display the annotated frame
cv2.imshow("YOLOv8 Inference", annotated_frame)
# Press Q to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
# Break the loop if the end of the video is reached
break
参考資料
yolov8については、最も分かりやすい動画かも。
https://youtu.be/WgPbbWmnXJ8?t=4098