More than 5 years have passed since last update.

ゼロからのkaggleチャレンジ #3 - Digit Recognizer(データ前処理)

Last updated at 2019-02-24Posted at 2019-01-13

はじめに

前回はDigit Recognitionに挑戦した全容を紹介しましたので
引き続き今回からは具体的にデータの前処理についてソースコードとともに紹介していきたいと思います。
全体のソースコードはkaggleで公開してますのでそちらも参照ください。

データの前処理

データの前処理としては以下を実施しています。

画像データの正規化
one-hot Encoding
データの増強（Data Augmentation)

画像データの正規化

python3

random_seed = 1
X_train_split, X_test_split, y_train_split, y_test_split = train_test_split(X_train, y_train, random_state=random_seed)

# 正規化
X_train_split = X_train_split/255
X_test_split = X_test_split/255
X_test = X_test/255
X_train_all = X_train/255

one-hot Encoding

機械学習では２つの数字が近い場合、例えば”２と３”や”６と７”、は”２と７”のように離れた数字よりも
似ていると判断します。
ただ、今回画像から分類する数字(０〜９)は数が近いからといって似ているわけではないので
One-hot Encodingを行いこれを解決します。

Python3

y_train_split = to_categorical(y_train_split, num_classes)
y_test_split = to_categorical(y_test_split, num_classes)

# 出力例
to_categorical(y_train[0:3])
# array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
#        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
#        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

データの増強

機械学習ではデータの量と質が性能に直接影響しますので、
人工的に加工することでオリジナルデータを増強(Data Augmentation)し性能向上を図ります。
データの増強とは元の画像データに回転、反転、ズームなど加工を行った画像を追加し、変化に強い学習を行うための学習データを増やす処理になります。

Kerasで用意されいてるData AugmentationのクラスImageDataGeneratorを使用し,
オリジナル画像を加工した画像を追加します。
行った加工は画像の回転、ズーム、水平シフト、垂直シフトになります。

Python3

from keras.preprocessing.image import ImageDataGenerator
datagen_split = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
    zoom_range=0.1, # ランダムにズームする範囲
    width_shift_range=0.1, # ランダムに水平シフトする範囲
    height_shift_range=0.1, # ランダムに垂直シフトする範囲
    )
datagen_split.fit(X_train_split)

おわりに

次回はMNISTデータ分類のために実装したCNNモデルについて紹介したいと思います。

参考資料

・Keras Documentation 画像の前処理、https://keras.io/ja/preprocessing/image/
・Hands-on Machine Learning with Scikit-Learn

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up