More than 1 year has passed since last update.

回転不変位相限定相関（RIPOC）解説

Last updated at 2023-09-18Posted at 2020-04-23

はじめに

2つの画像がどれぐらいずれているかを推定する方法。

上段：ターゲット画像
中段：位置ずれがある画像
下段：位置ずれがある画像を変換してターゲット画像に重ねたもの

理論

下記論文に沿って解説を行う。
B.S.Reddy, B.N.Chatterji, "An FFT-based technique for translation, rotation, and scale-invariant image registration", IEEE Transactions on Image Processing, Volume 5, Issue 8, Aug 1996, p.1266-1271

画像の平行移動、回転、拡大はフーリエ変換と座標返還をうまく組み合わせると、すべてフーリエ変換後の関数の位相変換で表すことができる。これを見ていく。

画像はグレースケールとの1チャンネルとし、画像を関数$f(x,y)$で表す。
画像のずれとして、平行移動、回転、スケール変換を考える。

平行移動

関数$f(x,y)$のフーリエ変換を次の式で定義する。

F(\xi, \eta) = \int dx dy f(x,y)e^{-2\pi i (\xi x + \eta y)}

二つの関数の間に$f_2(x,y)=f(x-x_0, y-y_0), \forall (x,y)\in \mathbb{R}^2$の関係が成り立っているとする。これは画像$f_2$が元の画像$f_1$を$(x_0,y_0)$だけ平行移動した画像であることを表している。

フーリエ変換後の空間では、平行移動は位相変換に対応する。

\begin{aligned}
F_2(\xi, \eta) &= \int dx dy f_2(x,y)e^{-2\pi i (\xi x + \eta y)} \\
&= \int dx dy f_1(x-x_0,y-y_0)e^{-2\pi i (\xi x + \eta y)} \\
&= \int dx dy f_1(x,y)e^{-2\pi i \{\xi (x+x_0) + \eta (y+y_0)\}} \\
&= e^{-2\pi i (\xi x_0 + \eta y_0)}F_1(\xi, \eta)
\end{aligned}

したがって、平行移動した量は二つの関数のフーリエ変換から求められる。具体的にはcross-power spectrumを用いて位相を求める。

\frac{F_1(\xi, \eta)F_2^\ast(\xi, \eta)}{|F_1(\xi, \eta)F_2(\xi, \eta)|} 
= \frac{F_1(\xi, \eta)F_1^\ast(\xi, \eta)e^{2\pi i (\xi x_0 + \eta y_0)}}{|F_1(\xi, \eta)|^2}
= e^{2\pi i (\xi x_0 + \eta y_0)}

これを$F_2$にかけてフーリエ逆変換すれば、元の画像$f_1$が得られる。これを位相限定相関法（POC; Phase Only Correlation）という。

回転

平行移動に加え、回転も施したとする。

f_2(x,y) = f_1(x \cos\theta_0 + y\sin\theta_0 - x_0, -x\sin\theta_0 + y\cos\theta_0 -y_0)

フーリエ変換は次のようになる。

\begin{aligned}
F_2(\xi, \eta) &= \int dx dy f_2(x,y)e^{-2\pi i (\xi x + \eta y)} \\
&= \int dx dy f_1(x \cos\theta_0 + y\sin\theta_0 - x_0, -x\sin\theta_0 + y\cos\theta_0 -y_0)e^{-2\pi i (\xi x + \eta y)} \\
&= \int dx' dy' f_1(x', y')e^{-2\pi i \{ (\xi\cos\theta_0+\eta\sin\theta_0)(x'+x_0) + (-\xi\sin\theta_0+\eta\cos\theta_0)(y'+y_0)\}} \\
&= e^{-2\pi i \{ (\xi\cos\theta_0+\eta\sin\theta_0)x_0 + (-\xi\sin\theta_0+\eta\cos\theta_0)y_0\}}F_1(\xi\cos\theta_0+\eta\sin\theta_0, -\xi\sin\theta_0+\eta\cos\theta_0)
\end{aligned}

位相部分を無視するために絶対値を考える。$M(\xi,\eta)\equiv|F(\xi,\eta)|$とすれば次のように書ける。

M_2(\xi,\eta) = M_1(\xi\cos\theta_0+\eta\sin\theta_0, -\xi\sin\theta_0+\eta\cos\theta_0)

極座標（$\xi=\rho\cos\theta,\eta=\rho\sin\theta$）で表現すれば次のように書ける。

M_2(\rho,\theta) = M_1(\rho, \theta+\theta_0)

これで回転を平行移動で表すことができた。この$\theta_0$は位相限定相関法を用いて求めることができる。

スケール変換

$f_2$は$f_1$をスケール変換したものとする。原点を中心にして、x軸、y軸それぞれの方向に$(a,b)$倍スケール変換したとする。ここで$a>0, b>0$とする。

f_2(x,y) = f_1(ax, by)

フーリエ変換は次のようになる。

\begin{aligned}
F_2(\xi, \eta) 
&= \int dx dy f_2(x,y)e^{-2\pi i (\xi x + \eta y)} \\
&= \int dx dy f_1(ax, by)e^{-2\pi i (\xi x + \eta y)} \\
&= \frac{1}{ab}\int dx' dy' f_1(x', y')e^{-2\pi i (\frac{\xi}{a} x' + \frac{\eta}{b} y')} \\
&= \frac{1}{ab}F_1(\frac{\xi}{a}, \frac{\eta}{b})
\end{aligned}

座標を対数変換する。

\begin{aligned}
F_2(\log\xi, \log\eta) 
&= \frac{1}{ab}F_1(\log\xi-\log a, \log\eta-\log b) \\
\end{aligned}

分かりやすさのために変数を書き換える。

\begin{aligned}
F_2(x, y) 
&= \frac{1}{ab}F_1(x-c, y-d) \\
\end{aligned}

これよりスケール変換を平行移動と表せることがわかる。ここで、$1/ab$のスケール因子がかかっているが、cross-power spectrumの式を見てわかるように分子分母で打ち消しあうので、この因子は無視できる。したがって位相限定相関法で$c,d$を求められ、スケール量$a,b$が求められる。

a = e^c, b=e^d

さて、座標について整理しよう。
今$(x,y)$を$(x/a, y/a)$にスケール変換したとする。極座標では次のように表せる。

\begin{aligned}
\rho_1 &= (x^2+y^2)^{1/2} \\
\theta_1 &= \tan^{-1}(y/x) \\
\rho_2 &= ((x/a)^2+(y/a)^2)^{1/2}=\frac{1}{a}(x^2+y^2)^{1/2} = \frac{\rho_1}{a}\\
\theta_2 &= \tan^{-1}((y/a)/(x/a)) = \tan^{-1}(y/x) = \theta_1
\end{aligned}

したがって、$f_1$に平行移動、回転、スケール変換を施すと、フーリエ変換の絶対値は極座標で次のような関係になる。

\begin{aligned}
M_1(\rho, \theta) = M_2(\rho/a, \theta-\theta_0) 
\end{aligned}

動径方向座標の対数をとることで、さらに次のようになる。

\begin{aligned}
M_1(\xi, \theta) = M_2(\xi-d, \theta-\theta_0) 
\end{aligned}

ここで、$\xi=\log \rho,d=\log a$である。スケール変換と回転が平行移動の形で表せたので、位相限定相関法で求めることができる。

以上より、スケール変換と回転について求めたパラメータで逆変換を施す。後は平行移動が残っているので、再度位相限定相関法を施し、平行移動の逆変換を施すことで$f_2$を$f_1$に重ねることができる。

実装

こちらのサイトが参考になります。

from enum import Enum

import cv2
import numpy as np
from matplotlib import pyplot as plt


def ripoc(target, source):
    """Rotation Invariant Phase Only Correlation.
    Parameters
    ----------
    target : ndarray of shape (height, width, channel)
        Reference image. Channel is (B, G, R).
    source : int, shape (height, width, channel)
        This image will be transformed. Channel is (B, G, R).
    Returns
    -------
    x : float
        Shift of position x.
    y : float
        Shift of position y.
    angle : float
        Shift of angle.
    scale : float
        Shift of scale.

    :Authors:
        Original : kakasis <https://kakasi.hatenablog.com/entry/2019/09/26/131702>
        Modification (2023) : ground0state
    """
    # Convert to grayscale
    g_t = np.asarray(cv2.cvtColor(
        target, cv2.COLOR_BGR2GRAY), dtype=np.float32)
    g_s = np.asarray(cv2.cvtColor(
        source, cv2.COLOR_BGR2GRAY), dtype=np.float32)

    # Window function
    h, w = g_t.shape
    window = window_func(h, w, WindowType.HAMMING.value)

    # The magnitude of the FFT is too large, so we take log.
    f_t = np.fft.fftshift(20*np.log(np.abs(np.fft.fft2(g_t * window))))
    f_s = np.fft.fftshift(20*np.log(np.abs(np.fft.fft2(g_s * window))))

    # Maximum radius of the polar coordinate transformation
    # Ref : <https://docs.opencv.org/4.0.1/da/d54/group__imgproc__transform.html>
    m = inscribed_circle_radius(h, w)

    # Logarithmic polar transformation
    center = (w / 2, h / 2)
    flags = cv2.INTER_LANCZOS4 + cv2.WARP_POLAR_LOG
    p_t = cv2.warpPolar(f_t, (w, h), center, m, flags)
    p_s = cv2.warpPolar(f_s, (w, h), center, m, flags)
    # Phase Only Correlation Method
    (x, y), _ = cv2.phaseCorrelate(p_t, p_s, window)

    # Converting the scale since the y-axis represents 360 degrees
    angle = - y * 360 / h
    # Maximum of x-axis is 'Maximum radius of the polar coordinate transformation'
    # Converting the scale
    scale = float(np.exp(- x / m))
    # Compute the transformation matrix and convert it
    M = cv2.getRotationMatrix2D(center, -angle, 1./scale)
    t_s = cv2.warpAffine(g_s, M, (w, h))
    # Phase Only Correlation Method
    (x, y), _ = cv2.phaseCorrelate(g_t, t_s)
    return x, y, angle, scale


def inscribed_circle_radius(height, width):
    # 長方形の短い辺を見つける
    shorter_side = min(height, width)
    # 内接する円の半径は短い辺の半分
    radius = shorter_side / 2
    return radius


class WindowType(Enum):
    """Window functions.
    Ref : <https://docs.scipy.org/doc/numpy/reference/routines.window.html>
    """
    HAMMING = 'hamming'
    HANNING = 'hanning'
    BARTLETT = 'bartlett'
    BLACKMAN = 'blackman'


def window_func(h, w, kind="hamming"):
    """Rotation Invariant Phase Only Correlation.
    Parameters
    ----------
    h : int
        Height of Image.
    w : int
        Width of Image.
    kind : str, optional (default="hamming")
        Kind of window functions.
    Returns
    -------
    window : ndarray of shape (height, width)
         window.
    """

    if kind == WindowType.HAMMING.value:
        func = np.hamming
    elif kind == WindowType.BARTLETT.value:
        func = np.bartlett
    elif kind == WindowType.BLACKMAN.value:
        func = np.blackman
    elif kind == WindowType.HANNING.value:
        func = np.hanning
    else:
        raise ValueError("Does not match any window type.")

    window_y = func(h)
    window_x = func(w)
    window = window_y.reshape(h, 1) * window_x
    return window


def show_image(img, mode='gray'):
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.imshow(img, mode)
    plt.show()


def alpha_blend(img1, img2, alpha):
    """
    Perform alpha blending between two images.

    Parameters:
    - img1, img2: Images to blend. They must have the same shape.
    - alpha: Blending ratio. 0 <= alpha <= 1.
             alpha = 0 means img1 is fully visible,
             alpha = 1 means img2 is fully visible.
    """
    if img1.shape != img2.shape:
        raise ValueError("Both images must have the same shape.")

    blended = cv2.addWeighted(img1, 1 - alpha, img2, alpha, 0)
    return blended


def transform_image(img, x_shift, y_shift, angle, scale):
    height = img.shape[0]
    width = img.shape[1]
    center = (int(width / 2), int(height / 2))
    trans = cv2.getRotationMatrix2D(center, angle, scale)
    trans += np.float32([[0, 0, x_shift], [0, 0, y_shift]])
    dst = cv2.warpAffine(img, trans, (width, height))
    return dst


def inverse_transform_image(img, pred_x_shift, pred_y_shift, pred_angle, pred_scale):
    height = img.shape[0]
    width = img.shape[1]
    center = (int(width / 2), int(height / 2))
    M = cv2.getRotationMatrix2D(
        center, -pred_angle, 1./pred_scale)
    M[0][2] -= pred_x_shift
    M[1][2] -= pred_y_shift
    dst = cv2.warpAffine(img, M, (width, height))
    return dst


img1 = cv2.imread("sample.jpg")
img2 = transform_image(
    img1,
    x_shift=8.0,
    y_shift=4.0,
    angle=7.0,
    scale=1.15
)
# original and transformed
show_image(img1)
show_image(img2)

# inverse transform
pred_x_shift, pred_y_shift, pred_angle, pred_scale = ripoc(img1, img2)
print(pred_x_shift, pred_y_shift, pred_angle, pred_scale)
dst = inverse_transform_image(
    img2, pred_x_shift, pred_y_shift, pred_angle, pred_scale)
show_image(dst)

# compare
res = alpha_blend(img1, dst, alpha=0.5)
show_image(res)

おわりに

結構使い勝手がよく、色々応用できそう。応用できすぎて特許がたくさんありそうなので、使うときは注意が必要かも。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up