More than 3 years have passed since last update.

CLIPでkaokoreデータセットの「化身」と「人」を見分ける（CLIP動作確認編）

Posted at 2021-04-23

はじめに

誰が何と言おうと**「人」か「化身」か分類をしてみたい**。
前回の記事でCLIPでkaokoreデータセットの「化身」と「人」を見分ける（CLIP説明編）という記事を書きました。今回は試しました編です。

動作環境は、

google colaboratory
windows 10
です。

用いたデータセット

データセットは前の記事Microsoft Lobeを顔コレデータセット（KaoKore Dataset）を用いて画像分類してみるでも利用したKaokore Datasetです。

データセットの説明は前の記事に書いているので、今回は↓この4種類の画像分類に挑戦します。

画像は、こちらのKaoKore Dataset githubのサイトからダウンロードして使いました。

早速画像分類にトライ！

なお、Roboflowさんのブログ記事中に出てくるCLIP benchmarking Colab notebook
を全面的に参考にしています。このブログでは花のデイジーとタンポポの画像分類を試しています。
もともとはnpakaさんのnote記事 CLIPを試す - OpenAIのZero-shot画像分類器がキッカケでした。読みやすくて非常に参考になります。

CLIPダウンロード

ダウンロードはブログ記事そのままです。

# installing some dependencies, CLIP was release in PyTorch
import subprocess

CUDA_version = [s for s in subprocess.check_output(["nvcc", "--version"]).decode("UTF-8").split(", ") if s.startswith("release")][0].split(" ")[-1]
print("CUDA version:", CUDA_version)

if CUDA_version == "10.0":
    torch_version_suffix = "+cu100"
elif CUDA_version == "10.1":
    torch_version_suffix = "+cu101"
elif CUDA_version == "10.2":
    torch_version_suffix = ""
else:
    torch_version_suffix = "+cu110"

!pip install torch==1.7.1{torch_version_suffix} torchvision==0.8.2{torch_version_suffix} -f https://download.pytorch.org/whl/torch_stable.html ftfy regex

importなど

import numpy as np
import torch
import os

print("Torch version:", torch.__version__)

私が試した環境では

Torch version: 1.7.1+cu110

が出力しました。

CLIPをgit clone

# clone the CLIP repository
!git clone https://github.com/openai/CLIP.git
%cd CLIP

画像を保存したところにアクセス

google drive上に画像を保存していたので、マウントします。

from google.colab import drive
drive.mount('/content/drive')

ちなみにgoogle deive上では、このような感じのフォルダ構成で画像を置きました。
フォルダ名をラベルのクラス名として利用します。

この”_tokenization.txt”にキャプションを記載します。
こんな感じ。

キャプションのリストとして読み込みます。

candidate_captions = []
with open('_tokenization.txt') as f:
    candidate_captions = f.read().splitlines()

画像分類したいクラス名をリスト化します。

class_names = os.listdir('.')
class_names.remove('_tokenization.txt')

import CLIPなど

import torch
import clip
from PIL import Image
import glob

def argmax(iterable):
    return max(enumerate(iterable), key=lambda x: x[1])[0]

device = "cuda" if torch.cuda.is_available() else "cpu"
model, transform = clip.load("ViT-B/32", device=device)

correct = []

# define our target classificaitons, you can should experiment with these strings of text as you see fit, though, make sure they are in the same order as your class names above
text = clip.tokenize(candidate_captions).to(device)

精度評価！

今回は各ラベルの画像枚数を同じにするため、各ラベル535枚の画像を用いて評価をしました。
たったこれだけのコードで精度評価できるだと・・。

for cls in class_names:
  print(cls)
  class_correct = []
  test_imgs = glob.glob('datasets/' + cls + '/*.jpg')
  for img in test_imgs:
      #prit(img)
      image = transform(Image.open(img)).unsqueeze(0).to(device)
      with torch.no_grad():
          image_features = model.encode_image(image)
          text_features = model.encode_text(text)
          
          logits_per_image, logits_per_text = model(image, text)
          probs = logits_per_image.softmax(dim=-1).cpu().numpy()

          pred = class_names[argmax(list(probs)[0])]
          #print(pred)
          if pred == cls:
              #print("img:",img, "pred:",probs)
              correct.append(1)
              class_correct.append(1)
          else:
              correct.append(0)
              class_correct.append(0)
    
  print('accuracy on class ' + cls + ' is :' + str(sum(class_correct)/len(class_correct)))
print('accuracy on all is : ' + str(sum(correct)/len(correct)))

結果はこんな感じでした。

化身(incarnation)と分類できたのはわずか2%。
以前のMicrosoftのLobeで画像分類した時は少量の画像で精度100%だったので、
結果はイマイチでした。
武士(warrior)だけ**81%**と精度が高いのは、武士に関する記事を事前に学習していたのでしょうか？

前回の記事で**CLIP は、自動車のモデル、花の種類、航空機の種類を区別するような、細かい分類を行うことができません。**と書いてあった通り、人か化身かの区別はCLIP先生にはちょっと難しかったようです。
ちなみに微妙に'_tokenization.txt'のキャプションを変更すると多少精度が変わりますが、飛躍的に精度向上することはありませんでした。

参考記事

ありがとうございます！

元論文:Learning Transferable Visual Models From Natural Language Supervision
npakaさんのnote記事 CLIPを試す - OpenAIのZero-shot画像分類器
roboflowのブログ記事How to Try CLIP: OpenAI's Zero-Shot Image Classifier

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up