More than 3 years have passed since last update.

SIFT descriptors(特徴量)について理解する

Last updated at 2021-03-01Posted at 2021-03-01

はじめに

OpenCVのSIFTで特徴量を抽出してSVMで分類する、というプログラムを作成しようと思っています。特徴量の抽出にはSIFTを使う予定ですが、SIFTで抽出される特徴量とは何なのかについて少し調べたので、備忘として残します。
尚、SIFTについては2020年に特許が切れて使いやすくなりました（こちらを参照）。

開発環境

Ubuntu 18.04.4 LTS
Python 3.6.9
opencv 4.5.1
dlib 19.21.1

実際の出力値を確認

まずは、実際にSIFTで特徴量を抽出し、取得したdescriptorsを出力してみました。
以下が、使用したソースコードの一部です。

sift = cv2.SIFT_create()
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
keypoints, descriptors = sift.detectAndCompute(img, None)
print('number of keypoints({0})'.format(len(keypoints)))
print('sift descriptors: ndim({0}), shape({1})'.format(descriptors.ndim, descriptors.shape))
print(descriptors)

出力結果は次のようになりました。

number of keypoints(195)
sift descriptors: ndim(2), shape((195, 128))
[[  0.   0.   0. ...   0.   0.   0.]
 [141.  59.   8. ...   0.   0.   0.]
 [  0.   0.   0. ...   1.   0.   9.]
 ...
 [  2. 108. 122. ...   0.   0.   5.]
 [  0.   0.   0. ...   0.   0.  72.]
 [  3.   0.   0. ...   1.   0.   4.]]

195個のkeypointが検出され、それぞれについて128個の値が保存されていることがわかります。これは、keypoint周辺の16個のブロック(1ブロック = 4 x 4)を8bin orientation histogramで表現した結果です。
16 x 8 = 128
8binの'8'の意味が何かというと、image gradientの角度を8通りで表すという意味になります。
1bin = 0〜44度
2bin = 45〜89度
...
8bin = 315〜360度

gradientの角度をθとおくと、tanθ = y方向の傾き / x方向の傾き
となるので、gradient角度は
θ = arctan(y方向の傾き / x方向の傾き)
で求まります。

各ブロック内の4x4についてimage gradientを計算し、該当するbinに image gradientの大きさを足し合わせいくことで、'8bin orientation histogram'なるものができあがります。実際には、keypointからの距離等も考慮して足し合わせるようです（keypointからの距離が離れるほど値が小さくなるように）。最後に正規化した結果が、上記の出力結果と思われます。

まとめ

実際にOpenCVのSIFTを使う際に、descritporsの詳細を気にする必要は無いと思いますし、もう気にすることも無いと思いますが、ある程度理解したことで、今までよりもSIFTを使いやすくなった気がします。

参考文献

OpenCVのマニュアルを確認しましたが、4行程度のあっさりとした説明しかありません。予備知識が無いと理解が難しいと思いますが、上記を読んで後であれば理解が容易になるのではと思います。

Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is devided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.

以下のページが参考になりました。
https://aishack.in/tutorials/sift-scale-invariant-feature-transform-features/
https://ai.stanford.edu/~syyeung/cvweb/tutorial2.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up