More than 5 years have passed since last update.

Redshiftでウィンドウ関数を使ってスコアに応じたカテゴリごとのランキングを出す

Last updated at 2016-02-07Posted at 2016-02-07

概要

Redshiftではウィンドウ関数を使って便利な集計ができます。
（ウィンドウ関数に関してはPostgreSQLウィンドウ関数のTipsを参考にするといいです。）

その記事のウィンドウ関数に手を加えてちょっと便利なことをしてみます。

やりたい事

以下のようなデータがあったときに、教科(subject)ごとにgradeが高い生徒(name)をランキング表示する。
これを使えば、例えばECのサイトで、カテゴリごとに売上が高いX個のアイテムのみを取り出すということもできます。

    name     | subject | grade | last_seen_in_class 
-------------+---------+-------+--------------------
 Michael     | english |     6 | 2014-08-24
 Emily       | math    |     5 | 2014-08-17
 Christopher | math    |     9 | 2014-08-24
 William     | english |     0 | 2014-07-18
 Tyler       | english |     4 | 2014-08-20
 Alexander   | history |    10 | 2014-08-22
 Benjamin    | history |     1 | 2014-08-24
 Jacob       | english |     9 | 2014-08-16
 Matthew     | english |     7 | 2014-08-24
 Emma        | math    |     8 | 2014-08-17
 Ashley      | math    |    10 | 2014-08-16
 Grace       | english |     3 | 2014-08-21
 Alexis      | history |     4 | 2014-08-24
 Victoria    | history |     4 | 2014-08-24
(14 行)

手順

Redshiftで試しています。

テーブル作成

CREATE TEMP TABLE student_subjects (
   name text,
   subject text,
   grade int DEFAULT NULL,
   last_seen_in_class date
);

データの用意

INSERT INTO student_subjects (name, subject, grade, last_seen_in_class) VALUES 
    ('Jacob', 'english', '9', '2014-08-16'), ('Michael', 'english', '6', '2014-08-24'), 
    ('Matthew', 'english', '7', '2014-08-24'), ('Emily', 'math', '5', '2014-08-17'),
    ('Emma', 'math', '8', '2014-08-17'), ('Christopher', 'math', '9', '2014-08-24'),
    ('Ashley', 'math', '10', '2014-08-16'), ('William', 'english', '0', '2014-07-18'),
    ('Grace', 'english', '3', '2014-08-21'), ('Tyler', 'english', '4', '2014-08-20'),
    ('Alexis', 'history', '4', '2014-08-24'), ('Alexander', 'history', '10', '2014-08-22'),
    ('Victoria', 'history', '4', '2014-08-24'), ('Benjamin', 'history', '1', '2014-08-24');

各サブジェクトごとにgradeでランクを出す

SELECT rank() OVER (PARTITION BY subject ORDER BY grade DESC), name, subject, grade
FROM student_subjects
ORDER BY subject, grade DESC;

結果

 rank |    name     | subject | grade 
------+-------------+---------+-------
    1 | Jacob       | english |     9
    2 | Matthew     | english |     7
    3 | Michael     | english |     6
    4 | Tyler       | english |     4
    5 | Grace       | english |     3
    6 | William     | english |     0
    1 | Alexander   | history |    10
    2 | Alexis      | history |     4
    2 | Victoria    | history |     4
    4 | Benjamin    | history |     1
    1 | Ashley      | math    |    10
    2 | Christopher | math    |     9
    3 | Emma        | math    |     8
    4 | Emily       | math    |     5
(14 行)

各サブジェクトごとにgradeのTop3を出す

ウィンドウ関数を使ったSQL内ではwhere句が使えないので入れ子にしてWHEREで取得するrankを制限します。

SELECT
    *
FROM
    (
        SELECT rank() OVER (PARTITION BY subject ORDER BY grade DESC), name, subject, grade
        FROM student_subjects
        ORDER BY subject, grade DESC
    ) as t
WHERE
    rank <= 3

結果

 rank |    name     | subject | grade 
------+-------------+---------+-------
    1 | Jacob       | english |     9
    2 | Matthew     | english |     7
    3 | Michael     | english |     6
    1 | Alexander   | history |    10
    2 | Alexis      | history |     4
    2 | Victoria    | history |     4
    1 | Ashley      | math    |    10
    2 | Christopher | math    |     9
    3 | Emma        | math    |     8
(9 行)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up