Help us understand the problem. What is going on with this article?

FizzBuzz問題をニューラルネットワークで解いてみる

More than 1 year has passed since last update.

FizzBuzz問題をニューラルネットワークで解いてみる

TL;DR

FizzBuzz問題をニューラルネットワークで解いてみます。

ラベルデータの作成

import pandas as pd

results = []

for i in range(1, 10000 + 1):
    if i % 3 == 0 and i % 5 == 0:
        results.append((i, 'FizzBuzz'))
    elif i % 3 == 0:
        results.append((i, 'Fizz'))
    elif i % 5 == 0:
        results.append((i, 'Buzz'))
    else:
        results.append((i, 'Number'))

data_df = pd.DataFrame(results, columns=['Number', 'Results'])
display(data_df.head(15))
Number Results
0 1 Number
1 2 Number
2 3 Fizz
3 4 Number
4 5 Buzz
5 6 Fizz
6 7 Number
7 8 Number
8 9 Fizz
9 10 Buzz
10 11 Number
11 12 Fizz
12 13 Number
13 14 Number
14 15 FizzBuzz

前処理

feature_title = 'Number'
label_title = 'Results'

printable_labels = {k: i for i, k in enumerate(data_df[label_title].unique())}
class_count = len(printable_labels)

display(data_df.head(15))
display(data_df.describe())
display(printable_labels)
Number Results
0 1 Number
1 2 Number
2 3 Fizz
3 4 Number
4 5 Buzz
5 6 Fizz
6 7 Number
7 8 Number
8 9 Fizz
9 10 Buzz
10 11 Number
11 12 Fizz
12 13 Number
13 14 Number
14 15 FizzBuzz
Number
count 10000.00000
mean 5000.50000
std 2886.89568
min 1.00000
25% 2500.75000
50% 5000.50000
75% 7500.25000
max 10000.00000
{'Number': 0, 'Fizz': 1, 'Buzz': 2, 'FizzBuzz': 3}
from keras import utils

labels = utils.np_utils.to_categorical([printable_labels[label] for label in data_df[label_title]], num_classes=class_count)

display(labels.shape)
display(labels)
C:\Users\hidek\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.



(10000, 4)



array([[1., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       ...,
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]], dtype=float32)
import numpy as np

digits_count = 5

didgits_map = utils.np_utils.to_categorical(range(10), num_classes=10)
features = np.zeros((len(data_df), 5, 10), dtype = np.int32)

for i, number in enumerate(data_df[feature_title]):
    for t, digit in enumerate(str(number).zfill(digits_count)[:]):
        features[i, t] = didgits_map[int(digit)]

print(features.shape)
print(features)
(10000, 5, 10)
[[[1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [0 1 0 ... 0 0 0]]

 [[1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [0 0 1 ... 0 0 0]]

 [[1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 0]]

 ...

 [[1 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 1]
  [0 0 0 ... 0 0 1]
  [0 0 0 ... 0 0 1]
  [0 0 0 ... 0 1 0]]

 [[1 0 0 ... 0 0 0]
  [0 0 0 ... 0 0 1]
  [0 0 0 ... 0 0 1]
  [0 0 0 ... 0 0 1]
  [0 0 0 ... 0 0 1]]

 [[0 1 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]
  [1 0 0 ... 0 0 0]]]

データの分割

from sklearn.model_selection import train_test_split

idx_features = range(len(data_df[feature_title]))
idx_labels = range(len(data_df[label_title]))
tmp_data = train_test_split(idx_features, idx_labels, train_size = 0.9, test_size = 0.1)

train_features = np.array([features[i] for i in tmp_data[0]])
valid_features = np.array([features[i] for i in tmp_data[1]])
train_labels = np.array([labels[i] for i in tmp_data[2]])
valid_labels = np.array([labels[i] for i in tmp_data[3]])

print(train_features.shape)
print(valid_features.shape)
print(train_labels.shape)
print(valid_labels.shape)
(9000, 5, 10)
(1000, 5, 10)
(9000, 4)
(1000, 4)

ネットワークの作成

from keras.models import Sequential
from keras.layers.core import Activation
from keras.layers import Dense, Dropout, LSTM, Embedding, Reshape, RepeatVector, Permute, Flatten, SimpleRNN
from keras.layers.wrappers import Bidirectional, TimeDistributed
from keras.optimizers import RMSprop
from keras.callbacks import LambdaCallback, EarlyStopping, ModelCheckpoint
from keras import layers
from keras.layers.normalization import BatchNormalization
from keras import Input, Model

model = Sequential()
model.add(Flatten(input_shape=(features[0].shape)))
model.add(Dense(2048, activation='relu'))
model.add(Dense(2048, activation='relu'))
model.add(Dense(class_count, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['mae'])
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 50)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 2048)              104448    
_________________________________________________________________
dense_2 (Dense)              (None, 2048)              4196352   
_________________________________________________________________
dense_3 (Dense)              (None, 4)                 8196      
=================================================================
Total params: 4,308,996
Trainable params: 4,308,996
Non-trainable params: 0
_________________________________________________________________

学習

from keras.callbacks import LambdaCallback, EarlyStopping, ModelCheckpoint, TensorBoard

model_filename = 'models/fizzbuzzz-model.h5'

history = model.fit(train_features,
          train_labels,
          epochs = 300,
          validation_split = 0.1,
          batch_size = 256,
          callbacks = [
              TensorBoard(log_dir = 'logs'),
              EarlyStopping(patience=5, monitor='val_mean_absolute_error'),
              ModelCheckpoint(model_filename, monitor='val_mean_absolute_error', save_best_only=True)
          ])
Train on 8100 samples, validate on 900 samples
Epoch 1/300
8100/8100 [==============================] - 1s 137us/step - loss: 0.7933 - mean_absolute_error: 0.2498 - val_loss: 0.6256 - val_mean_absolute_error: 0.2134
Epoch 2/300
8100/8100 [==============================] - 0s 41us/step - loss: 0.6548 - mean_absolute_error: 0.2246 - val_loss: 0.6258 - val_mean_absolute_error: 0.2135
Epoch 3/300
8100/8100 [==============================] - 0s 39us/step - loss: 0.6422 - mean_absolute_error: 0.2232 - val_loss: 0.6292 - val_mean_absolute_error: 0.2229
Epoch 4/300
8100/8100 [==============================] - 0s 40us/step - loss: 0.6318 - mean_absolute_error: 0.2208 - val_loss: 0.6136 - val_mean_absolute_error: 0.2144
Epoch 5/300
8100/8100 [==============================] - 0s 41us/step - loss: 0.5816 - mean_absolute_error: 0.2072 - val_loss: 0.5271 - val_mean_absolute_error: 0.1884
[省略]
Epoch 297/300
8100/8100 [==============================] - 0s 40us/step - loss: 1.4231e-07 - mean_absolute_error: 3.3136e-08 - val_loss: 1.2583e-06 - val_mean_absolute_error: 6.0084e-07
Epoch 298/300
8100/8100 [==============================] - 0s 40us/step - loss: 1.4168e-07 - mean_absolute_error: 3.2659e-08 - val_loss: 1.2518e-06 - val_mean_absolute_error: 5.9729e-07
Epoch 299/300
8100/8100 [==============================] - 0s 39us/step - loss: 1.4122e-07 - mean_absolute_error: 3.2316e-08 - val_loss: 1.2392e-06 - val_mean_absolute_error: 5.9097e-07
Epoch 300/300
8100/8100 [==============================] - 0s 39us/step - loss: 1.4064e-07 - mean_absolute_error: 3.1852e-08 - val_loss: 1.2335e-06 - val_mean_absolute_error: 5.8808e-07

検証

from sklearn.metrics import classification_report, confusion_matrix

predicted_valid_labels = model.predict(valid_features).argmax(axis=1)
numeric_valid_labels = np.argmax(valid_labels, axis=1)
print(classification_report(numeric_valid_labels, predicted_valid_labels, target_names=printable_labels))
                 precision    recall  f1-score   support

         Number       1.00      1.00      1.00       551
           Fizz       1.00      1.00      1.00       270
           Buzz       1.00      1.00      1.00       119
       FizzBuzz       1.00      1.00      1.00        60

    avg / total       1.00      1.00      1.00      1000

Jupyter Notebook

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away