More than 5 years have passed since last update.

「実践コンピュータビジョン」の全演習問題をやってみた。詳細編　第２章　画像の局所記述子

Last updated at 2017-11-06Posted at 2017-11-04

まえおき

先日投稿した「実践コンピュータビジョン」の全演習問題をやってみたの詳細です。なお、あくまで「やってみた」であって、模範解答ではありません。「やってみた」だけなので、うまくいかなかった、という結果もあります。

この記事は「実践コンピュータビジョン」（オライリージャパン、Jan Erik Solem著、相川愛三訳）の演習問題に関する記事です。この記事を書いた人間と書籍との間には何の関係もありません。著者のJan氏と訳者の相川氏にはコンピュータビジョンについて学ぶ機会を与えて頂いたことに感謝します。

実践コンピュータビジョンの原書("Programming Computer Vision with Python")の校正前の原稿は著者のJan Erik Solem氏がCreative Commons Licenseの元で公開されています。。

回答中で使用する画像の多く、また、sfm.py, sift.py, stereo.pyなどの教科書中で作られるファイルはこれらのページからダウンロードできます。
オライリージャパンのサポートページ
 Jan Erik Solem氏のホームページ
 相川愛三氏のホームページ

ここで扱うプログラムはすべてPython2.7上で動作を確認しています。一部の例外を除きほとんどがJupyter上で開発されました。また必要なライブラリーを適宜インストールしておく必要があります。Jupyter以外の点は、上記の書籍で説明されています。

結構な分量なので章毎に記事をアップしています。今回は第２章です。

２章　画像の局所記述子

２章ではやくもコンピュータビジョンらしくなってきました。

2.4.1 Modify the function for matching Harris corner points to also take a maximum pixel distance between points for them to be considered as correspondences in order to make matching more robust.

点と点の最大距離を設定して、ハリスコーナー点を対応付ける関数をよりロバストにするという問題

私の回答

https://github.com/moizumi99/CVBookExercise/blob/master/Chapter-2/CV%20Book%20Ch-2%20Ex%201.ipynb
教科書内で作成した関数harris¹を使用します。

from PIL import Image
from numpy import *
from pylab import *
import harris

def plot_matches_dst(im1, im2, locs1, locs2, matchscores, show_below=True, dst=10000):
    """ Show a figure with lines joining the accepted matches
    inpu: im1, im2 (images as arrays), locs1, locs2 (feature location),
    matchscores (as output from 'match()'),
    show_below (if images should be shown below matches). """

    im3 = harris.appendimages(im1, im2)
    if show_below:
        im3 = vstack((im3, im3))

    imshow(im3)

    cols1 = im1.shape[1]
    for i,m in enumerate(matchscores):
        d = (locs1[i][1]-locs2[m][1])**2 + (locs1[i][0]-locs2[m][0])**2
        if (m>0 and d<dst**2):
            plot([locs1[i][1],locs2[m][1]+cols1],[locs1[i][0],locs2[m][0]],'c')
    axis('off')

im1 = array(Image.open('sf_view1.jpg').convert('L'))
im2 = array(Image.open('sf_view2.jpg').convert('L'))
wid = 5
harrisim = harris.compute_harris_response(im1, 5)
filtered_coords1= harris.get_harris_points(harrisim, wid+1)
d1 = harris.get_descriptors(im1, filtered_coords1, wid)
harrisim = harris.compute_harris_response(im2, 5)
filtered_coords2= harris.get_harris_points(harrisim, wid+1)
d2 = harris.get_descriptors(im2, filtered_coords2, wid)
matches = harris.match_twosided(d1, d2)
figure(figsize=(16, 16))
gray()
plot_matches_dst(im1, im2, filtered_coords1, filtered_coords2, matches)
show()

figure(figsize=(16, 16))
gray()
plot_matches_dst(im1, im2, filtered_coords1, filtered_coords2, matches, True, 100)
show()

上の画像が距離の上限を指定しないもの。下の画像が指定（100以下）したものです。より確からしい点が選ばれているように思えます。

2.4.2 Incrementally apply stronger blur (or ROF de-noising) to an image and extract Harris corners. What happens?

画像ぼかしを強くしながらハリスコーナーを検出しろという問題

私の回答

from PIL import Image
from numpy import *
from pylab import *
from scipy.ndimage import filters
import harris

im = array(Image.open('../data/empire.jpg').convert('L'))
im2 = zeros(im.shape)
figure(figsize=(8,8))
gray()
for sigma in range(6):
    im2 = filters.gaussian_filter(im, sigma*2)
    harrisim = harris.compute_harris_response(im2)
    filtered_coords = harris.get_harris_points(harrisim, 10, 0.1)
    print len(filtered_coords)
    harris.plot_harris_points(im2, filtered_coords)
show()

380

390

428

446

521

539

検出される点の数が増えました。減ると思っていたので不思議です。

2.4.3 An alternative corner detector to Harris is the FAST corner detector. There are a number of implementations including a pure Python version available at http://www.edwardrosten.com/work/fast.html. Try this detector, play with the sensitivity threshold, and compare the corners with the ones from our Harris implementation.

ハリスの代わりにファーストコーナー検出器を使えというもの。

私の回答

指定のライブラリからfast12を使って実装しました
https://github.com/moizumi99/CVBookExercise/blob/master/Chapter-2/CV%20Book%20Ch%202%20Ex%203.ipynb

from PIL import Image
from numpy import *
from pylab import *
from scipy.ndimage import filters
import harris
# use fast12.py from http://www.edwardrosten.com/work/fast.html
import fast12

im = array(Image.open('../data/empire.jpg').convert('
im2 = zeros(im.shape)
figure(figsize=(8,8))
gray()
im2 = im.copy()
harrisim = harris.compute_harris_response(im2)
filtered_coords = harris.get_harris_points(harrisim, 10, 0.1)
print len(filtered_coords)
harris.plot_harris_points(im2, filtered_coords)
show()

380


im3 = zeros(im.shape)
im3 = im.copy()
figure(figsize=(8,8))
gray()
imshow(im3)
[filtered_coords, str] = fast12.detect(im3, 60)
plot([p[0] for p in filtered_coords], [p[1] for p in filtered_coords], "*")
show()

上の絵がハリス、下がファーストです。検出結果の癖が違うようです。

2.4.4 Create copies of an image with different resolutions (for example by halving the size a few times). Extract SIFT features for each image. Plot and match features to get a feel for how and when the scale independence breaks down.

画像の解像度を変えてSIFT特徴量を抽出して対応を描画してスケールに対する普遍性が壊れる感じを感じ取れ、だそうです

私の回答

教科書内で作成するファイルsift.py²とSIFT特徴量を扱うことのできるパッケージVLFeat³を使います。

実は最初はスケールを変えても検出結果が変わらなくて不思議だったのですが、原書をみて謎がとけました。解像度を変える部分について翻訳では「例えば、何倍かに拡大します」となっている部分、原文では"for example by halving the size a few times（例えば、何回かサイズを半減させてみる）"とあります。そのとおりにすると、スケール普遍性が壊れる様子がわかりました。

from PIL import Image
from numpy import *
from pylab import *
import harris
import sift

imname1 = 'climbing_1_small'
imname2 = 'climbing_2_small'

im1 = array(Image.open(imname1+'.jpg').convert('L'))
im2 = array(Image.open(imname2+'.jpg').convert('L'))

sift.process_image(imname1+'.jpg', imname1+'.sift')
l1,d1 = sift.read_features_from_file(imname1+'.sift')

sift.process_image(imname2+'.jpg', imname2+'.sift')
l2,d2 = sift.read_features_from_file(imname2+'.sift')

matches = sift.match_twosided(d1, d2)
figure()
gray()
sift.plot_matches(im1, im2, l1, l2, matches)
show()

img2 = Image.open(imname2+'.jpg')

mag = 2
img3 = img2.resize((img2.width/mag, img2.height/mag), Image.LANCZOS)
img3.save(imname2+'_'+str(mag)+'.jpg')
sift.process_image(imname2+'_'+str(mag)+'.jpg', imname2+'_'+str(mag)+'.sift')
l3,d3 = sift.read_features_from_file(imname2+'_'+str(mag)+'.sift')
matches = sift.match_twosided(d1, d3)
figure(figsize=(4,4))
gray()
sift.plot_matches(im1, im2, l1, l3*mag, matches)
show()

figure(figsize=(4,4))
gray()
subplot(1, 2, 1)
imshow(im1)
subplot(1, 2, 2)
imshow(im2)
show()

for mag in [4, 8, 16]:
    img3 = img2.resize((img2.width/mag, img2.height/mag), Image.LANCZOS)
    img3.save(imname2+'_'+str(mag)+'.jpg')
    sift.process_image(imname2+'_'+str(mag)+'.jpg', imname2+'_'+str(mag)+'.sift')
    l3,d3 = sift.read_features_from_file(imname2+'_'+str(mag)+'.sift')
    matches = sift.match_twosided(d1, d3)
    figure(figsize=(8,8))
    gray()
    title('Magnify ratio: '+str(mag))
    sift.plot_matches(im1, im2, l1, l3*mag, matches)
    show()

最初の図が原寸同士でSIFT特徴量を対応付けたもの、次が原寸の左画像に1/2サイズの右画像を対応づけたものです（描画は同じスケールで行っていますが検出は異なるスケールです）。対応付けできた点が減っています。
さらに、1/3や1/4にすると対応付けられれた点がほとんどなくなり、1/6及び1/8では一つも対応付けられませんでした。

2.4.5 The VLFeat command line tools also contain an implementation of Maximally Stable Extremal Regions (MSER), http://en.wikipedia.org/wiki/Maximally_stable_extremal_regions, a region detector that finds blob like regions. Create a function for extracting MSER regions and pass them to the descriptor part of SIFT using the "--read-frames" option and one function for plotting the ellipse regions.

VLFeatのMSERを使ってMSER領域をSIFT処理に渡すようにします

私の回答

import os
from PIL import Image
from numpy import *
from pylab import *
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import sift

im = Image.open('climbing_1_small.jpg').convert('L')
im.save('tmp.pgm')

imagename = "tmp.pgm"
cmmd = str("$HOME/bin/vlfeat-0.9.20/bin/glnxa64/mser --min-area=0.0001 --max-area=0.001 --frames=tmp.frame "+imagename)
os.system(cmmd)

def plot_features(im, f):
    
    def draw_ellipse(c, w, h, ang):
        t = arange(0, 1.01, .01)*2*pi
        x = w*cos(t)
        y = h*sin(t)
        x2 = x*cos(ang)-y*sin(ang)+c[0]
        y2 = x*sin(ang)+y*cos(ang)+c[1]
        plot(x2, y2, 'b', linewidth=2)
    
    imshow(im)
    norm = sqrt(im.width*im.height)
    for g in f:
        [x, y, s11, s12, s22] = g
        a = array([[s11, s12], [s12, s22]])
        e, v = np.linalg.eig(a)
        w = sqrt(e[0])*2
        h = sqrt(e[1])*2
        ang = math.atan2(v[1][0], v[0][0])
        draw_ellipse([x, y], w, h, ang)
    axis('off')


f = loadtxt("tmp.frame")
print len(f)


figure(figsize=(16,16))
gray()
plot_features(im, f)
show()

def process_image_mser(imagename, resultname, params="--read-frames=mser.frame"):
    """ Process an image and save the results in a file. """

    if imagename[-3:] != 'pgm':
        # create a pgm file
        im = Image.open(imagename).convert('L')
        im.save('tmp.pgm')
        imagename = 'tmp.pgm'
    
    cmmd = str("$HOME/bin/vlfeat-0.9.20/bin/glnxa64/sift "+imagename+" --output="+resultname+" "+params)
    os.system(cmmd)
    print 'processed', imagename, 'to', resultname

## The MAN page of VLfeat MSER says frame is [x, y, a11, a12, a21, a22]
## But, this does not seem right
## Probably it is [x, y, size, angle]
## see this page http://www.vlfeat.org/overview/sift.html

ffff = []
for g in f:
    [x, y, s11, s12, s22] = g
    a = array([[s11, s12], [s12, s22]])
    e, v = np.linalg.eig(a)
    idx = e.argsort()[::-1]
    w = sqrt(e[0])*2
#    h = sqrt(e[1])*2
    ang = math.atan2(v[1][0], v[0][0])
#    ffff.append([x, y, w*cos(ang), w*sin(ang), -h*sin(ang), h*cos(ang)])  
    ffff.append([x, y, w, ang])  

savetxt("mser.frame", ffff, fmt='%4.4f', delimiter=' ')

process_image_mser("tmp.pgm", "tmp.sift")
l1,d1 = sift.read_features_from_file('tmp.sift')

figure(figsize=(16, 16))
gray()
sift.plot_features(im,l1,circle=True)
show()

上の画像がMSERの領域を示したもので、下の画像がその領域をSIFT特徴量の計算に使ったものです。
コメントにも書いてありますが、MSERのMANのとおりにframeが[x, y, a11, a12, a21, a22]だと解釈するとうまく動作しませんでした。代わりに[x, y, size, angle]と解釈するともっともらしい結果が得られました。（ただこれで正しいという確証はありません）

2.4.6 Write a function that matches features between a pair of images and estimates the scale difference and in-plane rotation of the scene based on the correspondences.

特徴量の対応付からスケールや回転量を推定しなさい

私の回答

この問題は３章の画像の変形まですすめばもっと素直に解くことができるのですが、この問題に到達した時点ではわたしはそんなことは知りませんでしたので、むりやり解いています。
https://github.com/moizumi99/CVBookExercise/blob/master/Chapter-2/CV%20Book%20Ch%202%20Ex%206.ipynb

from PIL import Image
from numpy import *
from pylab import *
import harris
import sift

imname1 = 'climbing_1_small'
imname2 = 'climbing_3_small'

im1 = array(Image.open(imname1+'.jpg').convert('L'))
im2 = array(Image.open(imname2+'.jpg').convert('L'))

sift.process_image(imname1+'.jpg', imname1+'.sift')
l1,d1 = sift.read_features_from_file(imname1+'.sift')

sift.process_image(imname2+'.jpg', imname2+'.sift')
l2,d2 = sift.read_features_from_file(imname2+'.sift')
matches = sift.match_twosided(d1, d2)

figure(figsize=(16, 16))
gray()
sift.plot_matches(im1, im2, l1, l2, matches)
show()

figure(figsize=(16,16))
show_below=True
im3 = sift.appendimages(im1, im2)
if show_below:
    im3 = vstack((im3, im3))

imshow(im3)

cols1 = im1.shape[1]
p = []
for i,m in enumerate(matches):
    if m>0:
        plot([l1[i][0],l2[m][0]+cols1],[l1[i][1],l2[m][1]],'c')
        p.append([l1[i][0], l1[i][1], l2[m][0], l2[m][1]])
axis('off')
show()

x1 = np.average((array(p).T)[0][2:-2])
y1 = np.average((array(p).T)[1][2:-2])
x2 = np.average((array(p).T)[2][2:-2])
y2 = np.average((array(p).T)[3][2:-2])
print '(', x1, ', ', y1, ')'
print '(', x2, ', ', y2, ')'

scale=[]
theta=[]
for i,m in enumerate(matches):
    if m>0:
        r1 = math.sqrt((l1[i][0]-x1)**2 + (l1[i][1]-y1)**2)
        r2 = math.sqrt((l2[m][0]-x2)**2 + (l2[m][1]-y2)**2)
        if (r2>10):
            scale.append(r2/r1)
        if (l2[m][0]-x2>10 and l1[i][0]-x1>10):
            t = math.atan2(l2[m][1]-y2, l2[m][0]-x2)-math.atan2(l1[i][1]-y1, l1[i][0]-x1)
            if (t<-pi):
                t += 2*pi
            if (t>pi):
                t -= 2*pi
            theta.append(t)
            
s = np.average(array(scale)[4:-4])
t = np.average(array(theta)[4:-4])
print "scale ratio:", s
print "rotation:", 180.0*t/pi

( 524.691564516 , 563.709096774 )
( 574.438245161 , 797.683870968 )
scale ratio: 1.50556257297
rotation: 7.8798626229

画像を表示したあとで対応点毎に回転量と拡大量を推定して、最後に全点について平均をとっています。かなり不正確です。
やりなおそうかとも思ったのですが、第３章で似たようなことをするので、このままにしました。

2.4.7 Download images for a location of your choice and match them as in the White house example. Can you find a better criteria for linking images? How could you use the graph to choose representative images for geographic locations?

ある場所を選んで画像をダウンロードし、教科書中の例のようにマッチングするという問題です。

私の回答

2017年現在、Panoramioがサービスを終了しているのでジオタグを元に画像をまとめてダウンロードすることができませんでした。仕方がないのでFlickrからキーワード（今回はGolden Gate Bridge）を使って画像を１６枚ほどダウンロードし、pictureフォルダに格納しておきました。
なお、教科書中で作成するライブラリimtoolsを使用します。

from PIL import Image
from numpy import *
from pylab import *
import sift
import imtools
import pydot
import os

imlist = imtools.get_imlist('picture')
featlist = []
for imname in imlist:
  sname = imname+'.sift'
  sift.process_image(imname,sname)
  featlist.append(sname)

nbr_images = len(imlist)
matchscores = zeros((nbr_images, nbr_images))
for i in range(nbr_images):
    for j in range(i,nbr_images):
        print 'comparing ', imlist[i], imlist[j]
        
        l1,d1 = sift.read_features_from_file(featlist[i])
        l2,d2 = sift.read_features_from_file(featlist[j])
        
        matches = sift.match_twosided(d1, d2)
        
        nbr_matches = sum(matches > 0)
        print 'number of matches = ', nbr_matches
        matchscores[i, j] = int(nbr_matches)

for i in range(nbr_images):
    for j in range(i+j, nbr_images):
        matchscores[j,i] = matchscores[i,j]
threshold = 2
path = os.path.abspath('.') + '/'
g = pydot.Dot(graph_type='graph')
for i in range(nbr_images):
    for j in range(nbr_images):
        if matchscores[i,j] > threshold:
            im = Image.open(imlist[i])
            im.thumbnail((100, 100))
            filename = str(i)+'.png'
            im.save(filename)
            g.add_node(pydot.Node(str(i), fontcolor='transparent', shape='rectangle', image=path+filename))
            
            im = Image.open(imlist[j])
            im.thumbnail((100, 100))
            filename = str(j)+'.png'
            im.save(filename)
            g.add_node(pydot.Node(str(j), fontcolor='transparent', shape='rectangle', image=path+filename))
            
            g.add_edge(pydot.Edge(str(i), str(j)))
g.write_png('goldengate.png')

figure(figsize=(16,16))
imshow(Image.open('goldengate.png'))
show()

問題には、「画像をリンクする基準としてもっといいものはあるか？」「代表画像を選ぶのにグラフを使う方法は？」とあります。
リンクの方は色情報を使えばもっと精度があげられそうです。ただし、時間帯で変わるのでそこはトレードオフです。
代表画像についてはもっともリンクされた画像数の多い画像が代表、ということにすればよいかと思います。

２章終わり

これで２章終わりです。まだ始まったばかりですが、だいぶCVらしくて楽しくなってきました。

訳書のサポートページからダウンロードできまるようです( https://www.oreilly.co.jp/books/9784873116075/ ) ↩
これもサポートページからダウンロードできます ↩
http://www.vlfeat.org/ ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

「実践コンピュータビジョン」の全演習問題をやってみた。詳細編 第２章 画像の局所記述子

まえおき

２章 画像の局所記述子

2.4.1 Modify the function for matching Harris corner points to also take a maximum pixel distance between points for them to be considered as correspondences in order to make matching more robust.

私の回答

2.4.2 Incrementally apply stronger blur (or ROF de-noising) to an image and extract Harris corners. What happens?

私の回答

私の回答

2.4.4 Create copies of an image with different resolutions (for example by halving the size a few times). Extract SIFT features for each image. Plot and match features to get a feel for how and when the scale independence breaks down.

私の回答

私の回答

2.4.6 Write a function that matches features between a pair of images and estimates the scale difference and in-plane rotation of the scene based on the correspondences.

私の回答

2.4.7 Download images for a location of your choice and match them as in the White house example. Can you find a better criteria for linking images? How could you use the graph to choose representative images for geographic locations?

私の回答

２章終わり

「実践コンピュータビジョン」の全演習問題をやってみた。詳細編　第２章　画像の局所記述子

２章　画像の局所記述子