More than 3 years have passed since last update.

【Python】VGG16 を用いた Multilayer Stacked Covariance Pooling (MSCP)

Posted at 2021-08-15

ImageNet で事前学習した学習済みモデル (VGG16 など) を特徴量抽出器として，土地利用の画像分類 (UC Merced Land Use Dataset¹ など) に使用する論文 ``Remote sensing scene classification using multilayer stacked covariance pooling²'' の紹介と Python での実装．

実行環境

Python 3.7.2
Keras 2.4.3

MSCP の概要

Multilayer Stacked Covariance Pooling (MSCP) のフレームワークを用いた画像分類は以下の手順で行われる．

学習済みモデルのいくつかの隠れ層から多層の特徴量マップを抽出する．
得られた特徴量マップを一つの多層特徴量マップにまとめ (stack)，まとめた特徴量マップから分散共分散行列を計算する．
計算された分散共分散行列を特徴量として，線形カーネルを用いたサポートベクターマシン (SVM) を学習する．

今回は，学習済みモデルとして VGG16 を使用し，UC Merced Land Use Dataset¹ を用いて画像分類を行う．
データセットのディレクトリ構造は

UCMerced_LandUse
└── Images
    ├── agricultural
    |   ├── agricultural00.tif
    |   ├── ...
    |   └── agricultural99.tif
    ├── airplane
    ├── ...

のようになっている．

特徴量マップの抽出

Keras で VGG16 を使用するにあたり，以下のサイトを参考にした．

VGG16 の層構造

Keras では，keras.applications.vgg16.VGG16 で VGG16 のモデルを使用することができる．今回は，全結合層は使用しないのでinclude_top=False，ImageNet で事前学習した重みを使用するため weights='imagenet' としている．

from keras.applications.vgg16 import VGG16


vgg16 = VGG16(include_top=False, weights='imagenet')
vgg16.summary()

# Model: "vgg16"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# input_2 (InputLayer)         [(None, None, None, 3)]   0         
# _________________________________________________________________
# block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
# _________________________________________________________________
# block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
# _________________________________________________________________
# block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
# _________________________________________________________________
# block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
# _________________________________________________________________
# block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
# _________________________________________________________________
# block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
# _________________________________________________________________
# block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
# _________________________________________________________________
# block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
# _________________________________________________________________
# block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
# _________________________________________________________________
# block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
# _________________________________________________________________
# block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
# _________________________________________________________________
# block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
# _________________________________________________________________
# block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
# _________________________________________________________________
# block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
# _________________________________________________________________
# block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
# _________________________________________________________________
# block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
# _________________________________________________________________
# block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
# _________________________________________________________________
# block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
# =================================================================
# Total params: 14,714,688
# Trainable params: 14,714,688
# Non-trainable params: 0
# _________________________________________________________________

VGG16 の隠れ層の取得

論文中では，block3_conv3, block4_conv3, block5_conv3 の出力を特徴量マップとして使用している．そこで，block3_conv3, block4_conv3, block5_conv3 の出力をアウトプットとした新しいモデルを定義する．

from keras.models import Model


outputs = [vgg16.layers[i].output for i in (9, 13, 17)]
model = Model(inputs=vgg16.input, outputs=outputs)

実際に隠れ層の出力を確認してみる．入力画像には，keras.applications.vgg16.preprocess_input で VGG16 のモデルに適した前処理を施すことができる．

from typing import List

from keras.applications.vgg16 import preprocess_input
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image


def extract_feature(model: Model, 
                    img: np.ndarray) -> List[np.ndarray]:
    # reshape (H, W, 3) -> (1, H, W, 3)
    processed_img = np.expand_dims(img, axis=0)
    processed_img = preprocess_input(processed_img)

    preds = model.predict(processed_img)
    # reshape (1, H, W, 3) -> (H, W, 3)
    preds = [pred[0] for pred in preds]
    return preds


file = './UCMerced_LandUse/Images/river/river00.tif'
img = np.array(Image.open(file))

features = extract_feature(model, img)

n_cols = len(features)+1
plt.figure(figsize=(16, 4))

plt.subplot(1, n_cols, 1)
plt.imshow(img)
plt.title(f'Input\nshape={img.shape}')
plt.axis('off')

for i, feature in enumerate(features, start=2):
    plt.subplot(1, n_cols, i)
    plt.imshow(feature[:, :, 0], cmap='jet')
    plt.title(f'block{i+1}_conv3\nshape={feature.shape}')
    plt.axis('off')

plt.show()

分散共分散行列の計算

特徴量マップのスタック

block3_conv3, block4_conv3, block5_conv3 の出力をそれぞれ，$M_{3,3}, M_{4,3}, M_{5,3}$ とすると，上で見たように，$M_{3,3} \in \mathbb{R}^{64 \times 64 \times 256}, M_{4,3} \in \mathbb{R}^{32 \times 32 \times 512}, M_{5,3} \in \mathbb{R}^{16 \times 16 \times 512}$ である．これらの特徴量マップをスタックするために，バイリニア補間を用いて $s \times s$ にリサイズする．
さらに，特徴量マップのチャネル数 (今回の場合は 256, 512, 512) の和が計算する分散共分散行列の次元数となるので，チャネルごとの平均を取ることにより，$d$ 次元まで次元削減する．
結果，$M_{3,3}, M_{4,3}, M_{5,3}$ はそれぞれ $\hat M_{3,3}, \hat M_{4,3}, \hat M_{5,3} \in \mathbb{R}^{s \times s \times d}$ へとダウンサンプルされ，これらをスタックすることにより，
$$M = \left[\hat M_{3,3};\hat M_{4,3};\hat M_{5,3}\right] \in \mathbb{R}^{s \times s \times D}, \quad D = 3d$$
が特徴量マップとして得られる．
ここまでを Python でまとめると以下の通り．今回は $s=16, d=32$ としている．

import cv2


def stack(features: List[np.ndarray], 
          depth: int = 32, size: int = 16) -> np.ndarray:
    features = [cv2.resize(feature, (size, size)) for feature in features]
    features = np.dstack(
        [np.dstack([np.mean(splitted, axis=-1) 
                    for splitted in np.array_split(feature, depth, axis=-1)]) 
         for feature in features])
    return features


depth = 32
size = 16

features = stack(features, depth=depth, size=size)
features.shape
# (16, 16, 96)

分散共分散行列の計算

分散共分散行列を画像の特徴量として使用するというアイデアは，2006 年に ``Region covariance: A fast descriptor for detection and classification³'' で Region Covariance Descriptor として提案された．この論文中では，画像の RGB 値や輝度値の勾配などをスタックし，分散共分散行列を計算している．概要は以下の記事にまとめている．

【Python】Region Covariance: 分散共分散行列とコンピュータビジョン - Qiita

本論文で使用される特徴量は，Region Covariance で使われていた輝度値の勾配などの畳み込み演算を，畳み込みニューラルネットの隠れ層の出力で置き換えたものといえる．

特徴量マップ $M \in \mathbb{R}^{s \times s \times D}$ から分散共分散行列は以下の手順で計算される．

特徴量 $M \in \mathbb{R}^{s \times s \times D}$ を $[z_1, z_2, \ldots, z_N] \in \mathbb{R}^{N \times D}\ (N = s^2)$ にベクトル化する．

次式で分散共分散行列 $C \in \mathbb{R}^{D \times D}$ を計算する．

\begin{align}
\mu &= \frac1N \sum_{i=1}^N z_i \in \mathbb{R}^D \\
C &= \frac{1}{N-1} \sum_{i=1}^N (z_i - \mu) (z_i - \mu)^\top \in \mathbb{R}^{D \times D}
\end{align}

$C \leftarrow C + \epsilon I \ (\epsilon > 0)$ として，$C \in \mathbb{R}^{D \times D}$ の正定値性を保証する．

Python では，numpy.cov で分散共分散行列を計算できる．

epsilon = 1e-8

cov = np.cov(features.reshape(-1, depth * 3), rowvar=False)
cov += np.identity(depth * 3) * epsilon

SVM の適用

ユークリッド空間への変換

計算された分散共分散行列 $C \in \mathbb{R}^{D \times D}$ は正定値対称行列であり，ユークリッド空間の元ではないため，SVM をそのまま適用することができない．そこで，行列の対数を作用させることにより，ユークリッド空間の元 (対称行列) とする⁴．

$$\hat C = \mathop{\mathrm{logm}}C = U \log(\Sigma) U^\top \in \mathbb{R}^{D \times D}$$

ここで，$C=U\Sigma U^\top$ は分散共分散行列 $C$ の固有値分解である．行列の対数については，以下にまとめている．

【Python】行列指数関数・行列対数関数 - Qiita

Python では，エルミート行列用の固有値分解 np.linalg.eigh を用いて，行列の対数を実装できる．

def logm(x: np.ndarray) -> np.ndarray:
    w, v = np.linalg.eigh(x)
    return v @ (np.log(w) * v).T


cov = logm(cov)

ラベルの準備

クラス名 (agricultural, airplane, ...) に数字を割り当てる．

import os


img_path = './UCMerced_LandUse/Images/'
class_names = sorted(os.listdir(img_path))
class_names = [class_name for class_name in class_names 
               if os.path.isdir(os.path.join(img_path, class_name))]
class_name_dict = {class_name: i for i, class_name in enumerate(class_names)}

class_name_dict
# {'agricultural': 0,
#  'airplane': 1,
#  ...
#  'tenniscourt': 20}

SVM の適用

SVM を適用する準備ができたので，実際の流れを確認する．

まず，ファイル名からラベルを，画像から分散共分散行列を抽出する．

import glob

from tqdm.notebook import tqdm


# Hyperparameters
depth = 32
size = 16
epsilon = 1e-8

# Prepare model
vgg16 = VGG16(include_top=False, weights='imagenet')
outputs = [vgg16.layers[i].output for i in (9, 13, 17)]
model = Model(inputs=vgg16.input, outputs=outputs)

filenames = sorted(glob.glob('./UCMerced_LandUse/Images/*/*.tif'))

X, y = [], []
for file in tqdm(filenames):
    class_name = os.path.basename(os.path.dirname(file))
    y.append(class_name_dict[class_name])
    
    # Load an image
    img = np.array(Image.open(file))

    # Extract feature maps
    features = extract_feature(model, img)
    
    # Stack
    features = stack(features, depth=depth, size=size)

    # Covariance
    cov = np.cov(features.reshape(-1, depth * 3), rowvar=False)
    cov += np.identity(depth * 3) * epsilon
    
    # Logarithm
    cov = logm(cov)
    
    X.append(cov.ravel())

X = np.vstack(X)
y = np.array(y)

次に，データを学習用と検証用に分割する．

from sklearn.model_selection import train_test_split


train_X, test_X, train_y, test_y = train_test_split(
    X, y, train_size=0.8, random_state=0, stratify=y)

SVM で学習・予測を行う．

from sklearn.svm import SVC


svm = SVC(C=1, kernel='linear', random_state=0).fit(train_X, train_y)
pred_y = svm.predict(test_X)

Accuracy, Confusion Matrix を確認する．

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix


print(f'Accuracy: {accuracy_score(test_y, pred_y)}')
# Accuracy: 0.9523809523809523

plt.figure(figsize=(8, 6))
sns.heatmap(confusion_matrix(test_y, pred_y), annot=True, square=True, 
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up