0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

【BigQueryML】Bigquery上でBigQueryMLを体験してみた。

Posted at

 GCPのBigquery上で、MLをSQLで実行できるので体験してみました。BigQueryMLは、BigQueryを使用してSQLクエリーを通じて機械学習モデルを作成・実行できる機能です。

テーブル作成

スキーマに相当するデータセット(ga:マルチリージョン)を作成後に下記のSQLでga.kmeans_baseを作成する。
尚、gaのデータセットを作成する際にUSのマルチリージョンにしないと作成できない点に注意。

#standardSQL
CREATE OR REPLACE TABLE ga.kmeans_base AS
SELECT
  fullVisitorId,
  date,
  channelGrouping,
  deviceCategory,
  browser,
  metro,
  CASE
    WHEN hour >= 20 THEN 'Night'
    WHEN hour >= 17 THEN 'Evening'
    WHEN hour >= 12 THEN 'Afternoon'
    WHEN hour >= 6  THEN 'Morning'
    ELSE 'Night'
  END AS time,
  SUM(events) AS events,
  SUM(pageviews) AS pageviews,
  SUM(IF(rowNum = 1, timeOnSite, 0)) AS timeOnSite,
  SUM(IF(rowNum = 1, sales, 0)) AS sales
FROM (
  SELECT
    fullVisitorId,
    date,
    channelGrouping,
    device.deviceCategory AS deviceCategory,
    device.browser AS browser,
    geoNetwork.metro AS metro,
    hits.time,
    hits.hour AS hour,
    IF(hits.type = 'EVENT' AND hits.eventInfo.eventCategory IS NOT NULL, 1, 0) AS events,
    IF(hits.type = 'PAGE', 1, 0) AS pageviews,
    IF(totals.timeOnSite IS NULL, 0, totals.timeOnSite) AS timeOnSite,
    IF(totals.transactionRevenue IS NULL, 0, totals.transactionRevenue / 1000000) AS sales,
    ROW_NUMBER() OVER (PARTITION BY fullVisitorId, visitStartTime ORDER BY hits.time ASC) AS rowNum
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_*` AS ga,
    UNNEST(ga.hits) AS hits
)
GROUP BY
  1, 2, 3, 4, 5, 6, 7
HAVING
  sales > 0

select * from ga.kmeans_base;

SQLでモデル作成

SQLにてモデルを作成します。

CREATE OR REPLACE MODEL ga.kmeans_model
OPTIONS(model_type='kmeans', num_clusters=4, standardize_features = TRUE) AS
SELECT
  * EXCEPT(fullVisitorId, date, sales)
FROM
  ga.kmeans_base
WHERE
  date BETWEEN '20160801' AND '20170731';

モデル予測

SQLを使用してモデルの予測を実行します。

select
* except(nearest_centroids_distance)
from ml.predict(MODEL ga.kmeans_model, 
(
  select * except(fullvisitorId,date,sales)
  from ga.kmeans_base 
  where date between '20170801' and '20170830'
))

予測結果

image.png

Lookerによる可視化

結果の画面から「Looker Studio」による可視化が可能である。
image.png

Looker上から、CentroidIDごとにどのブラウザが使われるかの積み上げを可視化してみた。
image.png

■ 参考書籍
https://amzn.to/44mgXe9

終わりに

GCP上のBigqueryでものすごく簡単にMLの予測までSQLで実行してみました。全体像のイメージがつければ幸いです。
尚、プロジェクトは終わったら削除しておきましょう。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?