More than 1 year has passed since last update.

Open_Clipの使いかた

Posted at 2023-02-20

画像とテキストをそれぞれembeddingにし、比べることのできるモデル

使いかた

インストール

pip install open_clip_torch

たとえば、以下の画像をembeddingにして、３つのテキストをembeddingにしてテンソルの積を計算し、最も確率の高いものをsoftmaxで選びます。

import torch
from PIL import Image
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-32-quickgelu', pretrained='laion400m_e32')
tokenizer = open_clip.get_tokenizer('ViT-B-32-quickgelu')

image = preprocess(Image.open("cat.jpeg")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
torch.set_printoptions(sci_mode=False)

print("Label probs:", text_probs)

Label probs: tensor([[ 0.0000, 0.0000, 1.0000]])

["a diagram", "a dog", "a cat"]

　のうち３つ目の "a cat" が一番確率が高い。

🐣

フリーランスエンジニアです。
お仕事のご相談こちらまで
rockyshikoku@gmail.com

機械学習、ARアプリ（Web/iOS）を作っています。
機械学習／AR関連の情報を発信しています。

Twitter
Medium
GitHub

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up