More than 3 years have passed since last update.

Tensorflowのmodel.predict()から計算したval_accとmodel.evaluate()の結果が全然違う結果を返す件

TensorFlow2.0

Posted at 2020-10-25

盛大にハマったのでメモ。

ハマったこと

ImageDataGeneratorで生成したtrainingデータとvalidationデータを使って、InceptionResNetV3のfine tuningで学習する、普通のCNN。

import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers import Dense, Input, Flatten, Dropout, Conv2D, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.models import Model
from tensorflow.keras.applications import InceptionResNetV2
from tensorflow.python.keras.engine import training
from tensorflow.python.keras.layers import VersionAwareLayers
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Parameters
batch_size = 1024
epochs = 30
classes = [filename for filename in os.listdir("images") if not filename.startswith('.')]
num_classes = len(classes)
img_width, img_height = 256, 256
feature_dim = (img_width, img_height, 3)

# Image data generator
datagen = ImageDataGenerator(
    rescale=1.0 / 255,
    horizontal_flip=True,
    validation_split=0.3
)

train_generator = datagen.flow_from_directory(
    data_dir,
    subset="training",
    target_size=(img_width, img_height),
    color_mode="rgb",
    classes=classes,
    class_mode=class_mode, 
    batch_size=batch_size,
    shuffle=True
)

validation_generator = datagen.flow_from_directory(
    data_dir,
    subset="validation",
    target_size=(img_width, img_height),
    color_mode="rgb",
    classes=classes,
    class_mode=class_mode,
    batch_size=batch_size,
    shuffle=True
)

base_model = InceptionResNetV2(include_top=False, weights="imagenet", input_shape=input_shape)
    for layer in base_model.layers[:775]:
        layer.trainable = False
layer_output = base_model.output
layer_output = Flatten()(layer_output)
layer_output = Dense(256, activation="relu")(layer_output)
layer_output = Dropout(0.5)(layer_output)
layer_output = Dense(num_dense, activation=activation)(layer_output)

model = Model(base_model.input, layer_output)
loss = "categorical_crossentropy"
model.compile(
  loss=loss,
  optimizer="Adam",
  metrics=["accuracy"]
)

# Training
cp_cb = ModelCheckpoint(
    filepath="weights.hdf5",
    monitor="val_loss",
    save_best_only=True,
    save_weights_only=False,
    verbose=1,
    mode="min"
)

reduce_lr_cb = ReduceLROnPlateau(
    monitor="val_loss",
    factor=0.5,
    patience=1,
    verbose=1
)

num_train_samples = train_generator.n
num_validation_samples = validation_generator.n
steps_per_epoch_train = (num_train_samples-1) // batch_size + 1
steps_per_epoch_validation  = (num_validation_samples-1) // batch_size + 1
history = model.fit(
    train_generator,
    steps_per_epoch=steps_per_epoch_train,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=steps_per_epoch_validation,
    callbacks=[cp_cb, reduce_lr_cb]
)

これ走らせる。最後の3 epochsだけコピペ。

Epoch 28/30
10/10 [==============================] - ETA: 0s - loss: 0.1032 - accuracy: 0.9667 
Epoch 00028: val_loss did not improve from 0.12872

Epoch 00028: ReduceLROnPlateau reducing learning rate to 9.765625463842298e-07.
10/10 [==============================] - 216s 22s/step - loss: 0.1032 - accuracy: 0.9667 - val_loss: 0.1323 - val_accuracy: 0.9569
Epoch 29/30
10/10 [==============================] - ETA: 0s - loss: 0.1025 - accuracy: 0.9664 
Epoch 00029: val_loss did not improve from 0.12872

Epoch 00029: ReduceLROnPlateau reducing learning rate to 4.882812731921149e-07.
10/10 [==============================] - 229s 23s/step - loss: 0.1025 - accuracy: 0.9664 - val_loss: 0.1307 - val_accuracy: 0.9566
Epoch 30/30
10/10 [==============================] - ETA: 0s - loss: 0.1024 - accuracy: 0.9657 
Epoch 00030: val_loss improved from 0.12872 to 0.12831, saving model to ./drive/My Drive/InceptionResNetV2/weights-InceptionResNetV2.hdf5
10/10 [==============================] - 240s 24s/step - loss: 0.1024 - accuracy: 0.9657 - val_loss: 0.1283 - val_accuracy: 0.9590

val_accuracyが95%あるからええやんと。

どういう間違え方をしているのかなと思って同じvalidation datasetを使って混同行列を求めてみたらなぜかクソみたいな結果が。


from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import seaborn as sns
import numpy as np

Y_pred = model.predict(validation_generator)
y_pred = np.argmax(Y_pred, axis=1)

print(y_pred)
print(validation_generator.classes)

cmx_data = tf.math.confusion_matrix(validation_generator.classes, y_pred)
print(cmx_data)

plt.figure(figsize = (10,7))
sns.heatmap(cmx_data, annot=True, cmap="Blues")
plt.show()

[2 1 1 ... 1 2 2]
[0 0 0 ... 2 2 2]
Confusion Matrix
tf.Tensor(
[[ 52 205 189]
 [234 823 837]
 [232 773 851]], shape=(3, 3), dtype=int32)

なんだこりゃ。

解決法

試しにmodel.evaluate()で計算したらちゃんと95%になった。でもmodel.predict()を使って手計算でaccuracyを計算したら41%になった。

model.evaluate()とmodel.predict()のアルゴリズムが違うのかと散々調べたけど全然解決せず。泣きそうになりながらいろいろ調べたらこんなのがStackoverflowに。
tensorflow model.evaluate and model.predict very different results

flow_from_directory()でshuffle=Trueにしていたのだが、これをTrueにするとgeneratorを呼び出すたびに違うbatchから始まるラベルを返す。だからラベルと予想結果がずれるということらしい。

試しにshuffle=Falseにしてみたらきれいな混同行列になった。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up