More than 5 years have passed since last update.

Kaggle - Instacart上位陣解法まとめ

Last updated at 2017-09-27Posted at 2017-08-17

KaggleのInstacart Market Basket Analysis¹の上位陣解法についてまとめました．
参考になりそうでしたら幸いです．

Instacart Market Basket Analysis¹とは

ユーザーが次に注文する商品の予測．

データ構成²

ユーザー数: 20万
注文回数: 340万
商品数: 5万

全体の把握には記事²が参考になります．

[2位] 2nd Place Solution

Name: ONODERA
Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38143
Code: https://github.com/KazukiOnodera/Instacart
Writeup: http://blog.kaggle.com/2017/09/21/instacart-market-basket-analysis-winners-interview-2nd-place-kazuki-onodera/
public LB: 0.4094053 -> private LB: 0.4082040

特徴量

https://github.com/KazukiOnodera/Instacart

予測モデル

ユーザーがどの商品を買うか(user product pair model)
注文しないかどうか(Noneの予測)³

手法

F1最大化⁴
XGBoost

備考

ハイパーパラメータの調整はあんまりしてないらしい．

[3位] 3rd-Place Solution Overview

Name: sjv
Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38097
Code: https://github.com/sjvasquez/instacart-basket-prediction
public LB: 0.4092608 -> private LB: 0.4081041

手法

F1最大化⁴
Model Stacking⁵
First-level modesls
- RNN(LSTM)
- CNN(6-layer)
- SGNS⁶(Skip-Gram with Negative Sampling)
- NNMF⁷(Non-Negative Matrix Factorization)
Second-level models
- LightGBM
- Feedforward NN

[4位] 4-th Place Tips

Name: GeorgeGui
Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38102
public LB: 0.4092608 -> private LB: 0.4074451

手法

買い物かごの大きさによる確率の調整
- $1.6^{1/n^2}$
- $n$はバスケットサイズ(買い物かごに入れる商品の数)
- 次の注文がない場合は$n=1$
- $n$と次に購入する確率は負の相関
- $1.6$はGrid Searchで算出
- ある商品を買うと特定の商品の購入確率が上がる．
F1最大化⁴
CatBoost^2
LDA
NNMF⁷

意味なかったもの

lag1データ

[9位]　9th place Approach

Name: KazAnova
Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38100
Qiita Memo: https://qiita.com/namakemono/items/b3112655fb06a51fd72f
public LB: 0.4082750 -> private LB: 0.4070609

特徴量

sh1ng feature, https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/37697
arboretum feature, https://github.com/sh1ng/arboretum
現時点の連続購入記録⁸(streak, リーク情報)
https://www.kaggle.com/paulantoine/light-gbm-benchmark-0-3692/code
https://www.kaggle.com/nickycan/lb-0-3805009-python-edition

手法

F1最大化⁴
LightGBM
H2O⁹
Keras NN
Model Stacking⁵

意味なかったもの

product/aisle/departmentごとのモデル
Collaborative filtering²

[12位] 12th solution

Name: plantsgo
Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38110
Code: https://github.com/plantsgo/Instacart-Market-Basket-Analysis/tree/master
Qiita Memo: https://qiita.com/namakemono/items/f8fdeba0dbffb6e644a4
public LB: 0.4080915 -> private LB: 0.4064402

予測モデル

ユーザーがどの商品を買うか(user product pair model)
注文しないかどうか(Noneの予測)³

手法

Model Stacking
- 4つのモデルのアンサンブル
  - arbor: arboretum ¹⁰
  - lgbm: LightGBM ¹¹
  - label1: ユーザーごとに分けて学習
  - label2: 全てシャッフルして学習
- LBによる重み算出(過学習してそう...)

References

Kaggle Instacart, https://www.kaggle.com/c/instacart-market-basket-analysis ↩ ↩²
Kruegger, How to lift up to Bronze area in a week, https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/36848 ↩ ↩² ↩³
Anderson, Predicting "None", https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/35716 ↩ ↩²
Faron, Get Expected F1-Score in O(n²), https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/37221 ↩ ↩² ↩³ ↩⁴
Gorman, A Kaggler's Guide to Model Stacking in Practice, http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/, 2016. ↩ ↩²
Mikolov, Distributed representations of words and phrases and their compositionality, 2013. ↩
Lee, Algorithms for Non-negative Matrix Factorization, http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf, 2001. ↩ ↩²
Streak, https://www.kaggle.com/mmueller/order-streaks-feature/code ↩
H2O, http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html ↩
sh1ng, arboretum, https://github.com/sh1ng/arboretum ↩
sh1ng, Baseline 0.4029970, https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/37697 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Kaggle - Instacart上位陣解法まとめ

Instacart Market Basket Analysis1とは

データ構成2

[2位] 2nd Place Solution

特徴量

予測モデル

手法

備考

[3位] 3rd-Place Solution Overview

手法

[4位] 4-th Place Tips

手法

意味なかったもの

[9位] 9th place Approach

特徴量

手法

意味なかったもの

[12位] 12th solution

予測モデル

手法

References

Instacart Market Basket Analysis¹とは

データ構成²

[9位]　9th place Approach