More than 3 years have passed since last update.

BoW / BoVW(bag of visual words)について理解する

Posted at 2021-03-02

はじめに

OpenCVのSIFTで特徴量を抽出してSVMで分類する、というプログラムを作成しようと思っています。
特徴量をSIFTで抽出し、抽出したdescriptorsからBoW descrptors生成し、それを使ってSVMで分類を行う予定ですが、BoW(Bag of words)を画像特徴量に適用した場合のBoW descritptorsについて少し調べたので、備忘として残します。

開発環境

Ubuntu 18.04.4 LTS
Python 3.6.9
opencv 4.5.1
dlib 19.21.1

実際の出力値を確認

まずは、実際にSIFTで特徴量から生成したBoW vocabularyを出力してみました。
ここでは、OpenCVのBOWKMeansTrainerを使用しますが、ここでは自身の用途に使いやすくするようにBowKmeansTrainerクラスを作っています。

class BowKmeansTrainer:
    def __init__(self, dextractor, dmatcher, cluster_count):
        self._dextractor = dextractor
        self._trainer = cv2.BOWKMeansTrainer(cluster_count)
        self._extractor = cv2.BOWImgDescriptorExtractor(dextractor, dmatcher)

    def addSample(self, path):
        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        keypoints, descriptors = self._dextractor.detectAndCompute(img, None)
        if descriptors is None:
            logger.debug('No descriptor genearted')
        else:
            self._trainer.add(descriptors)

    def createVoc(self):
        voc = self._trainer.cluster()
        self._extractor.setVocabulary(voc)
        return voc

    def extractDescriptors(self, path):
        img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        keypoints = self._dextractor.detect(img)
        return self._extractor.compute(img, keypoints)

このクラスを使って実際にvocabularyを作成して表示しているコードが以下です。

    # create a sift 
    sift = cv2.SIFT_create()

    # create a flann matcher
    FLANN_INDEX_KDTREE = 1
    index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
    search_params = {}
    flann = cv2.FlannBasedMatcher(index_params, search_params)

    # create a Bow KMeans Trainer
    bow_trainer = BowKmeansTrainer(sift, flann, BOW_NUM_CLUSTERS)

    # add samples to the trainer
    for i in range(BOW_NUM_TRAINING_SAMPLES_PER_CLASS):
        pos_path, neg_path = file_manager.getFile(i)
        bow_trainer.addSample(pos_path)
        bow_trainer.addSample(neg_path)

    # create clusters
    voc = bow_trainer.createVoc()
    print('vocabulary.ndim({0}), shape({1}))'.format(voc.ndim, voc.shape))
    print(voc)

実際の出力結果は以下になります。

vocabulary.ndim(2), shape((40, 128)))
[[15.913044   7.         5.0869565 ...  1.3478261  2.7391305  2.9565217]
 [ 7.1785717  4.5535717  9.232143  ... 10.         5.339286  14.875001 ]
 [14.411765  16.411764   9.058824  ... 17.82353   12.441176   9.205882 ]
 ...
 [32.17647   43.911766  16.058825  ... 10.3529415 19.058825  47.32353  ]
 [ 6.257143   8.914286  14.514286  ...  8.228572   9.771429  12.2      ]
 [42.65625   24.59375   16.90625   ... 16.75      17.78125   16.375    ]]

作成するクラスタ数は40を設定(cluster_count=40)したの40個のWordsからなるvocabularyが作成されています。
SIFTの特徴量descriptorは128次元(SIFT descriptors(特徴量)について理解する)なので、40x128 arrayとなります。

vocabularyとは

vocabularyとは、Word(単語)の集合体です。ここではWordではなく、codewordの集合体と考えられます（OpenCVではvocabularyと呼んでいますが、codebookと呼ばれることもあるようです）。
今回の例では、40個のcodewordsからなるvocabularyが作成されたことになります。各codewordは各クラスタを表し、クラスタの幾何中心(centroid)となります。

次に、上記のVocabularyに基づいて抽出されたBoW descriptorの値も確認してみます。
クラスタ作成後に以下のコードを追加しました。

    # get a BoW descriptor
    pos_path, neg_path = file_manager.getFile(100)
    bow_descriptor = bow_trainer.extractDescriptors(pos_path)
    print('bow_descriptor.ndim({0}), shape({1}))'.format(bow_descriptor.ndim, bow_descriptor.shape))
    print(voc)

以下がBoW descriptorの出結果です。

bow_descriptor.ndim(2), shape((1, 40)))
[[15.913044   7.         5.0869565 ...  1.3478261  2.7391305  2.9565217]
 [ 7.1785717  4.5535717  9.232143  ... 10.         5.339286  14.875001 ]
 [14.411765  16.411764   9.058824  ... 17.82353   12.441176   9.205882 ]
 ...
 [32.17647   43.911766  16.058825  ... 10.3529415 19.058825  47.32353  ]
 [ 6.257143   8.914286  14.514286  ...  8.228572   9.771429  12.2      ]
 [42.65625   24.59375   16.90625   ... 16.75      17.78125   16.375    ]]

SIFT descriptorは128次元でしたが、BoW desciptorは40(クラスタ数)次元になったことが確認できます。

まとめ

次はSVMで実際に分類する処理を試したいと思います。

参考文献

Wikipedia: Bag-of-words model in computer vision
WORD vs. VOCABULARY

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up