文章で指定したものをなんでも検出できるAI Grounding DINO

Last updated at 2024-01-09Posted at 2024-01-09

言ったものをなんでも探してくれる

このモデルでは、ただ一言、

「ライオン」
と言うだけでいい。

しかも、

「たてがみの生えたライオン」

も検出できる。

「もっとも口を開けているライオン」

も検出できる。

トレーニング必要なし

これまで物体検出では、検出する物体の画像と座標データセットを用意してモデルをトレーニングする必要があった。
例えば、ライオンを検出するためにはライオンのデータを用意してモデルに覚えさせる必要があった。

しかし、このモデルはトレーニングの必要なく色々知っているのである。

使い方

インストール

# clone
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
# install libraries
pip install -e .
# download the model weights
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

実行

from groundingdino.util.inference import load_model, load_image, predict, annotate
import cv2

model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
IMAGE_PATH = "lions.jpg"
TEXT_PROMPT = "lion with mane"
BOX_TRESHOLD = 0.35
TEXT_TRESHOLD = 0.25

image_source, image = load_image(IMAGE_PATH)
boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=TEXT_PROMPT,
    box_threshold=BOX_TRESHOLD,
    text_threshold=TEXT_TRESHOLD
)

annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
cv2.imwrite("annotated_image.jpg", annotated_frame)

🐣

フリーランスエンジニアです。
AIについて色々記事を書いていますのでよかったらプロフィールを見てみてください。

もし以下のようなご要望をお持ちでしたらお気軽にご相談ください。
AIサービスを開発したい、ビジネスにAIを組み込んで効率化したい、AIを使ったスマホアプリを開発したい、
ARを使ったアプリケーションを作りたい、スマホアプリを作りたいけどどこに相談したらいいかわからない…

いずれも中間コストを省いたリーズナブルな価格でお請けできます。

お仕事のご相談はこちらまで
rockyshikoku@gmail.com

機械学習やAR技術を使ったアプリケーションを作っています。
機械学習／AR関連の情報を発信しています。

Twitter
Medium
GitHub

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up