LoginSignup
0
1

More than 3 years have passed since last update.

Data Every Day: Eコマースウィッシュでの夏服販売

Posted at

tldr

KggleのSales of summer clothes in E-commerce Wish
Summer Clothing Sales Prediction - Data Every Day #026に沿ってやっていきます。

実行環境はGoogle Colaboratorです。

# どんなデータ

服のセールスを予測します。各列はサイズなどの加情報です。

インポート

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import sklearn.preprocessing as sp
from sklearn.model_selection import train_test_split

import tensorflow as tf

データのダウンロード

Google Driveをマウントします。

from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

KaggleのAPIクライアントを初期化し、認証します。
認証情報はGoogle Drive内(/content/drive/My Drive/Colab Notebooks/Kaggle)にkaggle.jsonとして置いてあります。

import os
kaggle_path = "/content/drive/My Drive/Colab Notebooks/Kaggle"
os.environ['KAGGLE_CONFIG_DIR'] = kaggle_path

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate() 

Kaggle APIを使ってデータをダウンロードします。

dataset_id = 'jmmvutu/summer-products-and-sales-in-ecommerce-wish'
dataset = api.dataset_list_files(dataset_id)
file_name = dataset.files[0].name
file_path = os.path.join(api.get_default_download_dir(), file_name)
file_path
'/content/summer-products-with-rating-and-performance_2020-08.csv'
api.dataset_download_file(dataset_id, file_name, force=True, quiet=False)
100%|██████████| 351k/351k [00:00<00:00, 39.1MB/s]

Downloading summer-products-with-rating-and-performance_2020-08.csv.zip to /content









True
import zipfile

zip_path = '/content/' + file_name + '.zip'
with zipfile.ZipFile(zip_path) as existing_zip:
    existing_zip.extractall('/content')

データの読み込み

Padasを使ってダウンロードしてきたCSVファイルを読み込みます。

data = pd.read_csv(file_path)
data
title title_orig price retail_price currency_buyer units_sold uses_ad_boosts rating rating_count rating_five_count rating_four_count rating_three_count rating_two_count rating_one_count badges_count badge_local_product badge_product_quality badge_fast_shipping tags product_color product_variation_size_id product_variation_inventory shipping_option_name shipping_option_price shipping_is_express countries_shipped_to inventory_total has_urgency_banner urgency_text origin_country merchant_title merchant_name merchant_info_subtitle merchant_rating_count merchant_rating merchant_id merchant_has_profile_picture merchant_profile_picture product_url product_picture product_id theme crawl_month
0 2020 Summer Vintage Flamingo Print Pajamas Se... 2020 Summer Vintage Flamingo Print Pajamas Se... 16.00 14 EUR 100 0 3.76 54 26.0 8.0 10.0 1.0 9.0 0 0 0 0 Summer,Fashion,womenunderwearsuit,printedpajam... white M 50 Livraison standard 4 0 34 50 1.0 Quantité limitée ! CN zgrdejia zgrdejia (568 notes) 568 4.128521 595097d6a26f6e070cb878d1 0 NaN https://www.wish.com/c/5e9ae51d43d6a96e303acdb0 https://contestimg.wish.com/api/webimage/5e9ae... 5e9ae51d43d6a96e303acdb0 summer 2020-08
1 SSHOUSE Summer Casual Sleeveless Soirée Party ... Women's Casual Summer Sleeveless Sexy Mini Dress 8.00 22 EUR 20000 1 3.45 6135 2269.0 1027.0 1118.0 644.0 1077.0 0 0 0 0 Mini,womens dresses,Summer,Patchwork,fashion d... green XS 50 Livraison standard 2 0 41 50 1.0 Quantité limitée ! CN SaraHouse sarahouse 83 % avis positifs (17,752 notes) 17752 3.899673 56458aa03a698c35c9050988 0 NaN https://www.wish.com/c/58940d436a0d3d5da4e95a38 https://contestimg.wish.com/api/webimage/58940... 58940d436a0d3d5da4e95a38 summer 2020-08
2 2020 Nouvelle Arrivée Femmes Printemps et Été ... 2020 New Arrival Women Spring and Summer Beach... 8.00 43 EUR 100 0 3.57 14 5.0 4.0 2.0 0.0 3.0 0 0 0 0 Summer,cardigan,women beachwear,chiffon,Sexy w... leopardprint XS 1 Livraison standard 3 0 36 50 1.0 Quantité limitée ! CN hxt520 hxt520 86 % avis positifs (295 notes) 295 3.989831 5d464a1ffdf7bc44ee933c65 0 NaN https://www.wish.com/c/5ea10e2c617580260d55310a https://contestimg.wish.com/api/webimage/5ea10... 5ea10e2c617580260d55310a summer 2020-08
3 Hot Summer Cool T-shirt pour les femmes Mode T... Hot Summer Cool T Shirt for Women Fashion Tops... 8.00 8 EUR 5000 1 4.03 579 295.0 119.0 87.0 42.0 36.0 0 0 0 0 Summer,Shorts,Cotton,Cotton T Shirt,Sleeve,pri... black M 50 Livraison standard 2 0 41 50 NaN NaN CN allenfan allenfan (23,832 notes) 23832 4.020435 58cfdefdacb37b556efdff7c 0 NaN https://www.wish.com/c/5cedf17ad1d44c52c59e4aca https://contestimg.wish.com/api/webimage/5cedf... 5cedf17ad1d44c52c59e4aca summer 2020-08
4 Femmes Shorts d'été à lacets taille élastique ... Women Summer Shorts Lace Up Elastic Waistband ... 2.72 3 EUR 100 1 3.10 20 6.0 4.0 2.0 2.0 6.0 0 0 0 0 Summer,Plus Size,Lace,Casual pants,Bottom,pant... yellow S 1 Livraison standard 1 0 35 50 1.0 Quantité limitée ! CN youngpeopleshop happyhorses 85 % avis positifs (14,482 notes) 14482 4.001588 5ab3b592c3911a095ad5dadb 0 NaN https://www.wish.com/c/5ebf5819ebac372b070b0e70 https://contestimg.wish.com/api/webimage/5ebf5... 5ebf5819ebac372b070b0e70 summer 2020-08
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1568 Nouvelle Mode Femmes Bohême Pissenlit Imprimer... New Fashion Women Bohemia Dandelion Print Tee ... 6.00 9 EUR 10000 1 4.08 1367 722.0 293.0 185.0 77.0 90.0 0 0 0 0 bohemia,Plus Size,dandelionfloralprinted,short... navyblue S 50 Livraison standard 2 0 41 50 NaN NaN CN cxuelin99126 cxuelin99126 90 % avis positifs (5,316 notes) 5316 4.224605 5b507899ab577736508a0782 0 NaN https://www.wish.com/c/5d5fadc99febd9356cbc52ee https://contestimg.wish.com/api/webimage/5d5fa... 5d5fadc99febd9356cbc52ee summer 2020-08
1569 10 couleurs femmes shorts d'été lacent ceintur... 10 Color Women Summer Shorts Lace Up Elastic W... 2.00 56 EUR 100 1 3.07 28 11.0 3.0 1.0 3.0 10.0 0 0 0 0 Summer,Panties,Elastic,Lace,Casual pants,casua... lightblue S 2 Livraison standard 1 0 26 50 1.0 Quantité limitée ! CN sell best quality goods sellbestqualitygoods (4,435 notes) 4435 3.696054 54d83b6b6b8a771e478558de 0 NaN https://www.wish.com/c/5eccd22b4497b86fd48f16b4 https://contestimg.wish.com/api/webimage/5eccd... 5eccd22b4497b86fd48f16b4 summer 2020-08
1570 Nouveautés Hommes Siwmwear Beach-Shorts Hommes... New Men Siwmwear Beach-Shorts Men Summer Quick... 5.00 19 EUR 100 0 3.71 59 24.0 15.0 8.0 3.0 9.0 0 0 0 0 runningshort,Beach Shorts,beachpant,menbeachsh... white SIZE S 15 Livraison standard 2 0 11 50 NaN NaN CN shixueying shixueying 86 % avis positifs (210 notes) 210 3.961905 5b42da1bf64320209fc8da69 0 NaN https://www.wish.com/c/5e74be96034d613d42b52dfe https://contestimg.wish.com/api/webimage/5e74b... 5e74be96034d613d42b52dfe summer 2020-08
1571 Mode femmes d'été sans manches robes col en V ... Fashion Women Summer Sleeveless Dresses V Neck... 13.00 11 EUR 100 0 2.50 2 0.0 1.0 0.0 0.0 1.0 0 0 0 0 Summer,fashion women,Fashion,Lace,Dresses,Dres... white Size S. 36 Livraison standard 3 0 29 50 NaN NaN CN modai modai 77 % avis positifs (31 notes) 31 3.774194 5d56b32c40defd78043d5af9 0 NaN https://www.wish.com/c/5eda07ab0e295c2097c36590 https://contestimg.wish.com/api/webimage/5eda0... 5eda07ab0e295c2097c36590 summer 2020-08
1572 Pantalon de yoga pour femmes à la mode Slim Fi... Fashion Women Yoga Pants Slim Fit Fitness Runn... 7.00 6 EUR 100 1 4.07 14 8.0 3.0 1.0 0.0 2.0 0 0 0 0 Summer,Leggings,slim,Yoga,pants,Slim Fit,Women... red S 50 Livraison standard 2 0 41 50 NaN NaN CN AISHOPPINGMALL aishoppingmall 90 % avis positifs (7,023 notes) 7023 4.235939 5a409cf87b584e7951b2e25f 0 NaN https://www.wish.com/c/5e857321f53c3d2d8f25e7ed https://contestimg.wish.com/api/webimage/5e857... 5e857321f53c3d2d8f25e7ed summer 2020-08

1573 rows × 43 columns

下準備

不要な列の削除

columns_to_drop = [
    'title',
    'title_orig',
    'currency_buyer',
    'shipping_option_name',
    'urgency_text',
    'merchant_title',
    'merchant_name',
    'merchant_info_subtitle',
    'merchant_id',
    'merchant_profile_picture',
    'product_url',
    'product_picture',
    'product_id',
    'tags',
    'has_urgency_banner',
    'theme',
    'crawl_month',
    'origin_country',
 ]
data = data.drop(columns_to_drop, axis=1)

エンコード

Ordinal Features

data.isnull().sum()
price                            0
retail_price                     0
units_sold                       0
uses_ad_boosts                   0
rating                           0
rating_count                     0
rating_five_count               45
rating_four_count               45
rating_three_count              45
rating_two_count                45
rating_one_count                45
badges_count                     0
badge_local_product              0
badge_product_quality            0
badge_fast_shipping              0
product_color                   41
product_variation_size_id       14
product_variation_inventory      0
shipping_option_price            0
shipping_is_express              0
countries_shipped_to             0
inventory_total                  0
merchant_rating_count            0
merchant_rating                  0
merchant_has_profile_picture     0
dtype: int64
size_ordering = ['XXS', 'XS', 'S', 'M', 'L', 'XL', 'XXL']
def ordinal_encode(data, column, ordering):
    return data[column].apply(lambda x: ordering.index(x) if x in ordering else None)
data['product_variation_size_id'] = ordinal_encode(data, 'product_variation_size_id', size_ordering)

Onehot Features

def onehot_encode(data, column):
    dummies = pd.get_dummies(data[column])
    data = pd.concat([data, dummies], axis=1)
    data = data.drop(column, axis=1)
    return data
data = onehot_encode(data, 'product_color')
(data.dtypes == 'object').sum()
0

欠損値の処理

data.isnull().sum()
price               0
retail_price        0
units_sold          0
uses_ad_boosts      0
rating              0
                   ..
wine                0
wine red            0
winered             0
winered & yellow    0
yellow              0
Length: 125, dtype: int64
null_columns = ['rating_five_count', 'rating_four_count', 'rating_three_count', 'rating_two_count', 'rating_one_count', 'product_variation_size_id']
for column in null_columns:
    data[column] = data[column].fillna(data[column].mean())
data.isnull().sum().sum()
0

スケーリング

y = data['units_sold']
X = data.drop(['units_sold'], axis=1)
scaler = sp.MinMaxScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)
y.unique()
array([   100,  20000,   5000,     10,  50000,   1000,  10000, 100000,
           50,      1,      7,      2,      3,      8,      6])

yのエンコード

encoder = sp.LabelEncoder()
y = encoder.fit_transform(y)
y_mappings = {index: label for index, label in enumerate(encoder.classes_)}
y_mappings
{0: 1,
 1: 2,
 2: 3,
 3: 6,
 4: 7,
 5: 8,
 6: 10,
 7: 50,
 8: 100,
 9: 1000,
 10: 5000,
 11: 10000,
 12: 20000,
 13: 50000,
 14: 100000}

トレーニング

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)
model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(124,)),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(15, activation='softmax'),
])

model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_6 (Dense)              (None, 16)                2000      
_________________________________________________________________
dense_7 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_8 (Dense)              (None, 15)                255       
=================================================================
Total params: 2,527
Trainable params: 2,527
Non-trainable params: 0
_________________________________________________________________
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'],
)
batch_size = 32
epochs = 300

history = model.fit(
    X_train, 
    y_train,
    validation_split=0.2,
    batch_size=batch_size,
    epochs=epochs,
    verbose=0,
)
plt.figure(figsize=(14, 10))

epochs_range = range(1, epochs + 1)
train_loss = history.history['loss']
val_loss = history.history['val_loss']

plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')

plt.xlabel('Epoch')
plt.xlabel('Loss')

plt.show()

png

Epochがある時点を超えるとValidation Lossが上昇し、過学習が起こっていることが確認できます。
過学習が起きる直前のValidation Lossが最小値のEpochを求めます。

np.argmin(val_loss)
94
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1