2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

AtraeAdvent Calendar 2017

Day 17

Salesforceに買収された機械学習ソフトウェア「PredictionIO」を使ってみた

Last updated at Posted at 2017-12-17

PredictionIOとは

簡単に言うと、オープンソースで提供されている新し目の機械学習ソフトウェアです。
2016/02/19にSalesforce.comに買収されています。記事

詳しくは公式ドキュメントから引用します。

What is Apache PredictionIO?

Apache PredictionIO is an open source Machine Learning Server built on top of a state-of-the-art open source stack for developers and data scientists to create predictive engines for any machine learning > task. It lets you:

quickly build and deploy an engine as a web service on production with customizable templates;
respond to dynamic queries in real-time once deployed as a web service;
evaluate and tune multiple engine variants systematically;
unify data from multiple platforms in batch or in real-time for comprehensive predictive analytics;
speed up machine learning modeling with systematic processes and pre-built evaluation measures;
support machine learning and data processing libraries such as Spark MLLib and OpenNLP;
implement your own machine learning models and seamlessly incorporate them into your engine;
simplify data infrastructure management.
Apache PredictionIO can be installed as a full machine learning stack, bundled with Apache Spark, MLlib, HBase, Spray and Elasticsearch, which simplifies and accelerates scalable machine learning infrastructure > management..

やりたいこと

  • すでに保有しているデータを利用して、いろんな切り口でレコメンドの精度を高めたい

PredictionIOをインストール〜実装まで

いくつかPredictionIOが動く環境を作る方法はありますが、今回は簡単かつ依存関係が少ないだろうDockerを利用したパターンで実行してみました。

各種インストールは公式ドキュメントに書いてあります。

  1. Dockerをインストール
  2. Dockerを起動してログインする
docker run -it -p 8000:8000 steveny/predictionio /bin/bash
  1. EventServerを立ち上げる
# 起動
pio-start-all

# 確認
pio status

# 全て通っていればPredictionIOが立ち上がっている

# 似たものを抽出してくれるテンプレートを利用してみる
pio template get apache/predictionio-template-similar-product ~/MySimilarProduct

cd ~/MySimilarProduct/

pio app new MyApp1
[INFO] [HBLEvents] The table pio_event:events_1 doesn't exist yet. Creating now...
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyApp1
[INFO] [App$]         ID: 1
[INFO] [App$] Access Key: XqACwgQ5DsHZ3LP8yTQTcIGiEHzbT2dH-0GRK2ejkBHaHzUBWpEJp-y8uWXNIVo8
# このAccess Keyは後で利用します。

engine.jsonを以下のように書き換える。

$ vi engine.json
---
  "datasource": {
    "params" : {
      "appName": "MyApp1"
    }
  },
---

テストデータを入れてみる。

python data/import_eventserver.py --access_key ACCESS_KEY
#さっき作ったMyApp1のkeyを使う。
...
User u10 views item i20
User u10 views item i17
User u10 views item i22
User u10 views item i31
User u10 views item i18
User u10 views item i29
160 events are imported.

Engine Templateを作成→学習→deployしてみる

pio build --verbose
pio train
pio deploy --port 8000 #portは自由に指定できる ipを指定する場合は--ip 1.2.3.4
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.

実際に似たものを出力してみる

$ curl -H "Content-Type: application/json" \
-d '{ "items": ["i1"], "num": 4 }' \
http://localhost:8000/queries.json

複数を指定して似てるものを出力することもできる

curl -H "Content-Type: application/json" \
-d '{ "items": ["i1", "i3"], "num": 10}' \
http://localhost:8000/queries.json

{"itemScores":[{"item":"i12","score":1.1700499715209998},{"item":"i21","score":1.1153550716504106},{"item":"i43","score":1.1153550716504106},{"item":"i14","score":1.0773502691896257},{"item":"i39","score":1.0773502691896257},{"item":"i26","score":1.0773502691896257},{"item":"i44","score":1.0773502691896257},{"item":"i38","score":0.9553418012614798},{"item":"i36","score":0.9106836025229592},{"item":"i46","score":0.9106836025229592}]}

特定の属性を持ったデータだけに制限することもできる

curl -H "Content-Type: application/json" \
-d '{
  "items": ["i1", "i3"],
  "num": 10,
  "categories" : ["c4", "c3"]
}' \
http://localhost:8000/queries.json

{"itemScores":[{"item":"i21","score":1.1153550716504106},{"item":"i14","score":1.0773502691896257},{"item":"i26","score":1.0773502691896257},{"item":"i39","score":1.0773502691896257},{"item":"i44","score":1.0773502691896257},{"item":"i45","score":0.7886751345948129},{"item":"i47","score":0.7618016810571367},{"item":"i9","score":0.7618016810571367},{"item":"i28","score":0.7618016810571367},{"item":"i6","score":0.7618016810571367}]}

特定のリストを除外することもできる

curl -H "Content-Type: application/json" \
-d '{
  "items": ["i1", "i3"],
  "num": 10,
  "categories" : ["c4", "c3"],
  "blackList": ["i21", "i26", "i40"]
}' \
http://localhost:8000/queries.json

#blackListをwhiteListに変えると、対象を制限できる

{"itemScores":[{"item":"i39","score":1.0773502691896257},{"item":"i44","score":1.0773502691896257},{"item":"i14","score":1.0773502691896257},{"item":"i45","score":0.7886751345948129},{"item":"i47","score":0.7618016810571367},{"item":"i6","score":0.7618016810571367},{"item":"i28","score":0.7618016810571367},{"item":"i9","score":0.7618016810571367},{"item":"i29","score":0.6220084679281463},{"item":"i30","score":0.5386751345948129}]}

今回はテンプレートをそのまま使ったが、DASEと呼ばれるアーキテクチャを変更することで、

  • Data - includes Data Source and Data Preparator
  • Algorithm(s)
  • Serving
  • Evaluator

をカスタマイズすることができる。

今回利用したSimilar Product以外にも、

  • Recommendation
  • Classification
  • Regression
  • NLP
  • Clustering
  • Similarity

などが用意されていて、そのほかにもいくつかTemplateが用意されているので、試してみたいところ。

2
2
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?