More than 1 year has passed since last update.

matlabで始めるディープラーニング画像分類

Last updated at 2022-06-10Posted at 2021-10-16

matlabとは

アメリカ合衆国のMathWorks社が開発している数値解析ソフトウェアであり、その中で使うプログラミング言語の名称でもある。MATLABは、数値線形代数、関数とデータの可視化、アルゴリズム開発、グラフィカルインターフェイスや、他言語(C言語/C++/Java/Python)とのインターフェイスの機能を有している。MATLABは、主に、数値計算を扱う事ができるが、追加のオプションSymbolic Math Toolboxを使うことで、数式処理の能力を得ることができる。2019年時点でMATLABのユーザー数は400万人を超えており、100,000 以上の企業・政府・大学で、工学・理学・経済学など幅広い分野に利用されている。

引用wikipedia

deeplearningとは

ディープラーニング（英: Deep learning）または深層学習（しんそうがくしゅう）とは、対象の全体像から細部までの各々の粒度の概念を階層構造として関連させて学習する手法のことである[1][注釈 1]。深層学習として最も普及した手法は、（狭義には4層以上[2][注釈 2]の）多層の人工ニューラルネットワーク（ディープニューラルネットワーク、英: deep neural network; DNN）による機械学習手法である[3]。多層ニューラルネットワークについては、ジェフリー・ヒントンの研究チームが2006年に考案したスタックドオートエンコーダが直接の起源となった。

要素技術としてはバックプロパゲーションなど、20世紀のうちに開発されていたものの、4層以上の深層ニューラルネットについて、局所最適解や勾配消失などの技術的な問題によって十分学習させられず、性能も芳しくなかった。しかし、21世紀に入って、スタックドオートエンコーダを始めとするヒントンらによる多層ニューラルネットワークの学習の研究や、学習に必要な計算機の能力向上、および、インターネットの発展による学習データの流通により、十分に学習させられるようになった。その結果、音声・画像・自然言語を対象とする諸問題に対し、他の手法を圧倒する高い性能を示し[4]、2010年代に普及した[2]。学界では更に抽象化された数学的概念によるディープラーニングが研究されている[5][注釈 3]。

引用wikipedia

本題

1. データの読み込み

以下のimageDatastore関数を使いデータを読み込みます.

imds = imageDatastore('ファイルパス', オプション・・・);

今回は、sample.mファイルにコードを書いていくことにします.

sample.m

%% データの読み込み
imds = imageDatastore('./pictures', 'IncludeSubfolders', true, 'LabelSource', 'foldernames');

matlab公式サイト
imageDatastore関数

この時、画像のフォルダ構造は、以下のように設定する.

.
├── sample.m
└── pictures
    ├── pic01
    │   ├── ....jpg
    │   ├── ....jpg
    │       ....
    │   └── ....jpg
    ├── pic02
    │   ├── ....jpg
    │   ├── ....jpg
    │       ....
    │   └── ....jpg
    ├── pic03
    │   ├── ....jpg
    │   ├── ....jpg
    │       ....
    │   └── ....jpg
    ....
    └── picxx
        ├── ....jpg
            ....
        ├── ....jpg
        └── ....jpg

picturesフォルダの中には4種類の画像フォルダを用意し、それぞれのフォルダの中に対応する画像が複数枚保存されている.

2. データの分割

読み込んだ画像データを訓練用データとテスト用データに分けます.

[trainData, testData] = splitEachLabel(データ, p);
[trainData, testData] = splitEachLabel(データ, p, 'randomized');

pは分割の割合を表し、その割合は訓練データとテストデータの振り分け率を意味する(0<p<1)
例えば、p=0.6とした時、訓練データ(60%)、テストデータ(40%)となります。
randomizedを書くことで、分割の際ランダムで分割することができます.

matlab公式サイト
splitEachLabel関数

sample.m

%% データの分割
[trainData, testData] = splitEachLabel(imds, 0.6, 'randomized');

転移学習

Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.[1] For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although practical ties between the two fields are limited. From the practical standpoint, reusing or transferring information from previously learned tasks for the learning of new tasks has the potential to significantly improve the sample efficiency of a reinforcement learning agent.[2]

引用wikipedia

今回は、自作モデルではなく世の中にあるモデルを使い、その中身のネットワークをいじって、学習させることにします。
このことを、転移学習といいます.

今回使うモデルは、VGG-16というモデルを使います。
以下の画像の真ん中に位置します。

引用matlab公式サイト

VGG-16とは

16層からなるCNNモデルです。

引用VGG16モデルを使用してオリジナル写真の画像認識を行ってみる

興味がある方は、コードも見てみると良いかもです.
keras code

github

def VGG16(include_top=True,
          weights='imagenet',
          input_tensor=None,
          input_shape=None,
          pooling=None,
          classes=1000,
          **kwargs):
    """Instantiates the VGG16 architecture.
    Optionally loads weights pre-trained on ImageNet.
    Note that the data format convention used by the model is
    the one specified in your Keras config at `~/.keras/keras.json`.
    # Arguments
        include_top: whether to include the 3 fully-connected
            layers at the top of the network.
        weights: one of `None` (random initialization),
              'imagenet' (pre-training on ImageNet),
              or the path to the weights file to be loaded.
        input_tensor: optional Keras tensor
            (i.e. output of `layers.Input()`)
            to use as image input for the model.
        input_shape: optional shape tuple, only to be specified
            if `include_top` is False (otherwise the input shape
            has to be `(224, 224, 3)`
            (with `channels_last` data format)
            or `(3, 224, 224)` (with `channels_first` data format).
            It should have exactly 3 input channels,
            and width and height should be no smaller than 32.
            E.g. `(200, 200, 3)` would be one valid value.
        pooling: Optional pooling mode for feature extraction
            when `include_top` is `False`.
            - `None` means that the output of the model will be
                the 4D tensor output of the
                last convolutional block.
            - `avg` means that global average pooling
                will be applied to the output of the
                last convolutional block, and thus
                the output of the model will be a 2D tensor.
            - `max` means that global max pooling will
                be applied.
        classes: optional number of classes to classify images
            into, only to be specified if `include_top` is True, and
            if no `weights` argument is specified.
    # Returns
        A Keras model instance.
    # Raises
        ValueError: in case of invalid argument for `weights`,
            or invalid input shape.
    """
    backend, layers, models, keras_utils = get_submodules_from_kwargs(kwargs)

    if not (weights in {'imagenet', None} or os.path.exists(weights)):
        raise ValueError('The `weights` argument should be either '
                         '`None` (random initialization), `imagenet` '
                         '(pre-training on ImageNet), '
                         'or the path to the weights file to be loaded.')

    if weights == 'imagenet' and include_top and classes != 1000:
        raise ValueError('If using `weights` as `"imagenet"` with `include_top`'
                         ' as true, `classes` should be 1000')
    # Determine proper input shape
    input_shape = _obtain_input_shape(input_shape,
                                      default_size=224,
                                      min_size=32,
                                      data_format=backend.image_data_format(),
                                      require_flatten=include_top,
                                      weights=weights)

    if input_tensor is None:
        img_input = layers.Input(shape=input_shape)
    else:
        if not backend.is_keras_tensor(input_tensor):
            img_input = layers.Input(tensor=input_tensor, shape=input_shape)
        else:
            img_input = input_tensor
    # Block 1
    x = layers.Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv1')(img_input)
    x = layers.Conv2D(64, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block1_conv2')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

    # Block 2
    x = layers.Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv1')(x)
    x = layers.Conv2D(128, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block2_conv2')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

    # Block 3
    x = layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv1')(x)
    x = layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv2')(x)
    x = layers.Conv2D(256, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block3_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

    # Block 4
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv1')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv2')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block4_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

    # Block 5
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv1')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv2')(x)
    x = layers.Conv2D(512, (3, 3),
                      activation='relu',
                      padding='same',
                      name='block5_conv3')(x)
    x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

    if include_top:
        # Classification block
        x = layers.Flatten(name='flatten')(x)
        x = layers.Dense(4096, activation='relu', name='fc1')(x)
        x = layers.Dense(4096, activation='relu', name='fc2')(x)
        x = layers.Dense(classes, activation='softmax', name='predictions')(x)
    else:
        if pooling == 'avg':
            x = layers.GlobalAveragePooling2D()(x)
        elif pooling == 'max':
            x = layers.GlobalMaxPooling2D()(x)

    # Ensure that the model takes into account
    # any potential predecessors of `input_tensor`.
    if input_tensor is not None:
        inputs = keras_utils.get_source_inputs(input_tensor)
    else:
        inputs = img_input
    # Create model.
    model = models.Model(inputs, x, name='vgg16')

    # Load weights.
    if weights == 'imagenet':
        if include_top:
            weights_path = keras_utils.get_file(
                'vgg16_weights_tf_dim_ordering_tf_kernels.h5',
                WEIGHTS_PATH,
                cache_subdir='models',
                file_hash='64373286793e3c8b2b4e3219cbf3544b')
        else:
            weights_path = keras_utils.get_file(
                'vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
                WEIGHTS_PATH_NO_TOP,
                cache_subdir='models',
                file_hash='6d6bbae143d832006294945121d1f1fc')
        model.load_weights(weights_path)
        if backend.backend() == 'theano':
            keras_utils.convert_all_kernels_in_model(model)
    elif weights is not None:
        model.load_weights(weights)

    return model

3. モデルの読み込み

sample.m

%% ニューラルネットワーク読み込み
net = vgg16;

matlab公式サイト
VGG-16

中身について

net.Layers    % ←このコードを実行すると見れます

4. モデルの中身の層修正

Q. なぜ修正が必要なのでしょうか??
A. 既存のモデル出力数と予測したい種類数が異なるから

他の理由もありますが、最低限この理由は必須ですね.

画像で説明すると
読み込んだモデルの最終分類数が3つとします。

以下の画像のように、赤ワインか白ワインかを予測したいとき2種類なので、読み込んだモデルと異なります。

そのような場合、出力の種類を3つから2つに変更する必要があります。

では、コードへ

sample.m

%% ネットワーク層修正
layers = net.Layers;
numClasses = numel(categories(imds.Labels));
layers(39) = fullyConnectedLayer(numClasses);
layers(41) = classificationLayer;

matlab公式サイト
layerについて

5. モデルのインプットサイズに合わせて画像サイズをリサイズ

sample.m

%% ネットワークのインプットサイズに合わせて画像リサイズ
inputSize = net.Layers(1).InputSize(1:2);
augTrainData = augmentedImageDatastore(inputSize, trainData);
augTestData = augmentedImageDatastore(inputSize, testData);

matlab公式サイト
augmentedImageDatastore関数

6. 学習オプション

sample.m

%% 学習オプション
options = trainingOptions('sgdm', ...
    'MiniBatchSize',10, ...
    'MaxEpochs',6, ...
    'InitialLearnRate',1e-4, ...
    'Shuffle','every-epoch', ...
    'ValidationData', augTrainData, ...
    'ValidationFrequency',3, ...
    'Verbose',false, ...
    'Plots','training-progress');

matlab公式サイト
trainingOptions関数

7. 学習

sample.m

%% 学習
netTransfer = trainNetwork(augTrainData, layers, options);

matlab公式サイト
trainNetwork関数

学習オプションで、プロットを指定したことで、学習時可視化することができます。

引用matlab

8. テストデータを用いて予測

sample.m

%% テストデータで予測
preds = classify(netTransfer, augTestData);

matlab公式サイト
classify関数
Predsの中身に予測結果が代入されています。

全コード

sample.m

%% データの読み込み
imds = imageDatastore('./pictures', 'IncludeSubfolders', true, 'LabelSource', 'foldernames');

%% データの分割
[trainData, testData] = splitEachLabel(imds, 0.6, 'randomized');

%% ニューラルネットワーク読み込み
net = vgg16;

%% ネットワーク層修正
layers = net.Layers;
numClasses = numel(categories(imds.Labels));
layers(39) = fullyConnectedLayer(numClasses);
layers(41) = classificationLayer;

%% ネットワークのインプットサイズに合わせて画像リサイズ
inputSize = net.Layers(1).InputSize(1:2);
augTrainData = augmentedImageDatastore(inputSize, trainData);
augTestData = augmentedImageDatastore(inputSize, testData);

%% 学習オプション
options = trainingOptions('sgdm', ...
    'MiniBatchSize',10, ...
    'MaxEpochs',6, ...
    'InitialLearnRate',1e-4, ...
    'Shuffle','every-epoch', ...
    'ValidationData', augTrainData, ...
    'ValidationFrequency',3, ...
    'Verbose',false, ...
    'Plots','training-progress');

%% 学習
netTransfer = trainNetwork(augTrainData, layers, options);
%% テストデータで予測

preds = classify(netTransfer, augTestData);

次の記事で、テストデータを用いた予測の精度検証 & 作成したモデルを用いて画像の予測を行う記事を書きます.

part2

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

matlabで始めるディープラーニング 画像分類

matlabとは

deeplearningとは

本題

1. データの読み込み

2. データの分割

転移学習

VGG-16とは

3. モデルの読み込み

4. モデルの中身の層修正

5. モデルのインプットサイズに合わせて画像サイズをリサイズ

6. 学習オプション

7. 学習

8. テストデータを用いて予測

全コード

matlabで始めるディープラーニング画像分類