KaggleのInstacart Market Basket Analysis1の上位陣解法についてまとめました.
参考になりそうでしたら幸いです.
Instacart Market Basket Analysis1とは
ユーザーが次に注文する商品の予測.
データ構成2
- ユーザー数: 20万
- 注文回数: 340万
- 商品数: 5万
全体の把握には記事2が参考になります.
[2位] 2nd Place Solution
- Name: ONODERA
- Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38143
- Code: https://github.com/KazukiOnodera/Instacart
- Writeup: http://blog.kaggle.com/2017/09/21/instacart-market-basket-analysis-winners-interview-2nd-place-kazuki-onodera/
- public LB: 0.4094053 -> private LB: 0.4082040
特徴量
予測モデル
- ユーザーがどの商品を買うか(user product pair model)
- 注文しないかどうか(Noneの予測)3
手法
- F1最大化4
- XGBoost
備考
- ハイパーパラメータの調整はあんまりしてないらしい.
[3位] 3rd-Place Solution Overview
- Name: sjv
- Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38097
- Code: https://github.com/sjvasquez/instacart-basket-prediction
- public LB: 0.4092608 -> private LB: 0.4081041
手法
[4位] 4-th Place Tips
- Name: GeorgeGui
- Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38102
- public LB: 0.4092608 -> private LB: 0.4074451
手法
- 買い物かごの大きさによる確率の調整
- $1.6^{1/n^2}$
- $n$はバスケットサイズ(買い物かごに入れる商品の数)
- 次の注文がない場合は$n=1$
- $n$と次に購入する確率は負の相関
- $1.6$はGrid Searchで算出
- ある商品を買うと特定の商品の購入確率が上がる.
- F1最大化4
- CatBoost^2
- LDA
- NNMF7
意味なかったもの
- lag1データ
[9位] 9th place Approach
- Name: KazAnova
- Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38100
- Qiita Memo: https://qiita.com/namakemono/items/b3112655fb06a51fd72f
- public LB: 0.4082750 -> private LB: 0.4070609
特徴量
- sh1ng feature, https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/37697
- arboretum feature, https://github.com/sh1ng/arboretum
- 現時点の連続購入記録8(streak, リーク情報)
- https://www.kaggle.com/paulantoine/light-gbm-benchmark-0-3692/code
- https://www.kaggle.com/nickycan/lb-0-3805009-python-edition
手法
意味なかったもの
- product/aisle/departmentごとのモデル
- Collaborative filtering2
[12位] 12th solution
- Name: plantsgo
- Kaggle Discussion: https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/38110
- Code: https://github.com/plantsgo/Instacart-Market-Basket-Analysis/tree/master
- Qiita Memo: https://qiita.com/namakemono/items/f8fdeba0dbffb6e644a4
- public LB: 0.4080915 -> private LB: 0.4064402
予測モデル
- ユーザーがどの商品を買うか(user product pair model)
- 注文しないかどうか(Noneの予測)3
手法
- Model Stacking
References
-
Kaggle Instacart, https://www.kaggle.com/c/instacart-market-basket-analysis ↩ ↩2
-
Kruegger, How to lift up to Bronze area in a week, https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/36848 ↩ ↩2 ↩3
-
Anderson, Predicting "None", https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/35716 ↩ ↩2
-
Faron, Get Expected F1-Score in O(n²), https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/37221 ↩ ↩2 ↩3 ↩4
-
Gorman, A Kaggler's Guide to Model Stacking in Practice, http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/, 2016. ↩ ↩2
-
Mikolov, Distributed representations of words and phrases and their compositionality, 2013. ↩
-
Lee, Algorithms for Non-negative Matrix Factorization, http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf, 2001. ↩ ↩2
-
Streak, https://www.kaggle.com/mmueller/order-streaks-feature/code ↩
-
H2O, http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html ↩
-
sh1ng, arboretum, https://github.com/sh1ng/arboretum ↩
-
sh1ng, Baseline 0.4029970, https://www.kaggle.com/c/instacart-market-basket-analysis/discussion/37697 ↩