1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

PostgreSQLのウインドウ関数完全ガイド:全関数一覧と実例

Posted at

PostgreSQLのウインドウ関数完全ガイド:全関数一覧と実例

はじめに

PostgreSQLは、豊富なウインドウ関数を提供しており、データ分析やレポート作成において非常に強力なツールとなっています。この記事では、PostgreSQLで利用可能なすべてのウインドウ関数をカテゴリ別に整理し、詳細な説明と実践的な例を提供します。

ウインドウ関数の基本構文

PostgreSQLのウインドウ関数は以下の構文で使用します:

SELECT 
    列名,
    ウインドウ関数() OVER (
        [PARTITION BY 列名 [, 列名...]]
        [ORDER BY 列名 [ASC|DESC] [, 列名...]]
        [frame_clause]
    ) AS 別名
FROM テーブル名;

frame_clauseの構文

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

frame_start, frame_end: {
    UNBOUNDED PRECEDING
  | offset PRECEDING
  | CURRENT ROW
  | offset FOLLOWING
  | UNBOUNDED FOLLOWING
}

frame_exclusion: {
    EXCLUDE CURRENT ROW
  | EXCLUDE GROUP
  | EXCLUDE TIES
  | EXCLUDE NO OTHERS
}

1. 順位付け関数(Ranking Functions)

ROW_NUMBER()

各行に一意の連番を割り当てます。

SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

結果例:

name    | salary | row_num
--------|--------|--------
田中    | 500000 | 1
佐藤    | 450000 | 2
鈴木    | 400000 | 3

RANK()

同順位がある場合、同じ順位を割り当て、次の順位をスキップします。

SELECT 
    name,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;

結果例:

name    | salary | rank
--------|--------|-----
田中    | 500000 | 1
佐藤    | 500000 | 1
鈴木    | 400000 | 3

DENSE_RANK()

同順位がある場合、同じ順位を割り当て、次の順位をスキップしません。

SELECT 
    name,
    salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

結果例:

name    | salary | dense_rank
--------|--------|-----------
田中    | 500000 | 1
佐藤    | 500000 | 1
鈴木    | 400000 | 2

PERCENT_RANK()

各行の相対的な順位を0から1の範囲で返します。

SELECT 
    name,
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;

CUME_DIST()

累積分布を0から1の範囲で返します。

SELECT 
    name,
    salary,
    CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;

NTILE(n)

データをn個のグループに分割します。

SELECT 
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

2. 集約関数(Aggregate Functions)

COUNT()

SELECT 
    department,
    name,
    salary,
    COUNT(*) OVER (PARTITION BY department) AS dept_count
FROM employees;

SUM()

SELECT 
    department,
    name,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total,
    SUM(salary) OVER (ORDER BY hire_date) AS cumulative_salary
FROM employees;

AVG()

SELECT 
    department,
    name,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg,
    AVG(salary) OVER (
        ORDER BY hire_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3months
FROM employees;

MIN() / MAX()

SELECT 
    department,
    name,
    salary,
    MIN(salary) OVER (PARTITION BY department) AS dept_min,
    MAX(salary) OVER (PARTITION BY department) AS dept_max
FROM employees;

STDDEV() / STDDEV_POP() / STDDEV_SAMP()

SELECT 
    department,
    name,
    salary,
    STDDEV(salary) OVER (PARTITION BY department) AS dept_stddev,
    STDDEV_POP(salary) OVER (PARTITION BY department) AS dept_stddev_pop,
    STDDEV_SAMP(salary) OVER (PARTITION BY department) AS dept_stddev_samp
FROM employees;

VARIANCE() / VAR_POP() / VAR_SAMP()

SELECT 
    department,
    name,
    salary,
    VARIANCE(salary) OVER (PARTITION BY department) AS dept_variance,
    VAR_POP(salary) OVER (PARTITION BY department) AS dept_var_pop,
    VAR_SAMP(salary) OVER (PARTITION BY department) AS dept_var_samp
FROM employees;

3. 値関数(Value Functions)

LAG(expression [, offset [, default]])

前の行の値を取得します。

SELECT 
    month,
    sales,
    LAG(sales, 1) OVER (ORDER BY month) AS prev_month_sales,
    LAG(sales, 1, 0) OVER (ORDER BY month) AS prev_month_sales_default
FROM monthly_sales;

LEAD(expression [, offset [, default]])

次の行の値を取得します。

SELECT 
    month,
    sales,
    LEAD(sales, 1) OVER (ORDER BY month) AS next_month_sales,
    LEAD(sales, 1, 0) OVER (ORDER BY month) AS next_month_sales_default
FROM monthly_sales;

FIRST_VALUE(expression)

ウインドウ内の最初の値を取得します。

SELECT 
    department,
    name,
    salary,
    FIRST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
    ) AS dept_highest_salary
FROM employees;

LAST_VALUE(expression)

ウインドウ内の最後の値を取得します。

SELECT 
    department,
    name,
    salary,
    LAST_VALUE(salary) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS dept_lowest_salary
FROM employees;

NTH_VALUE(expression, n)

ウインドウ内のn番目の値を取得します。

SELECT 
    department,
    name,
    salary,
    NTH_VALUE(salary, 2) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS dept_second_highest
FROM employees;

4. 統計関数(Statistical Functions)

CORR(expression1, expression2)

相関係数を計算します。

SELECT 
    department,
    CORR(salary, years_experience) OVER (PARTITION BY department) AS salary_exp_corr
FROM employees;

COVAR_POP(expression1, expression2) / COVAR_SAMP(expression1, expression2)

共分散を計算します。

SELECT 
    department,
    COVAR_POP(salary, years_experience) OVER (PARTITION BY department) AS salary_exp_covar_pop,
    COVAR_SAMP(salary, years_experience) OVER (PARTITION BY department) AS salary_exp_covar_samp
FROM employees;

REGR_* 関数群

線形回帰に関連する関数群です。

SELECT 
    department,
    REGR_SLOPE(salary, years_experience) OVER (PARTITION BY department) AS slope,
    REGR_INTERCEPT(salary, years_experience) OVER (PARTITION BY department) AS intercept,
    REGR_R2(salary, years_experience) OVER (PARTITION BY department) AS r_squared
FROM employees;

5. 実践的な使用例

例1:部門別の給与分析

SELECT 
    department,
    name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
    ROUND(
        (salary - AVG(salary) OVER (PARTITION BY department)) / 
        STDDEV(salary) OVER (PARTITION BY department), 2
    ) AS salary_z_score,
    PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) AS salary_percentile
FROM employees
ORDER BY department, salary DESC;

例2:時系列データの分析

SELECT 
    date,
    sales,
    LAG(sales, 1) OVER (ORDER BY date) AS prev_day_sales,
    LAG(sales, 7) OVER (ORDER BY date) AS prev_week_sales,
    AVG(sales) OVER (
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7days,
    SUM(sales) OVER (
        ORDER BY date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_sales
FROM daily_sales;

例3:パーセンタイル分析

SELECT 
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS quartile,
    NTILE(10) OVER (ORDER BY salary) AS decile,
    PERCENT_RANK() OVER (ORDER BY salary) AS percentile_rank,
    CUME_DIST() OVER (ORDER BY salary) AS cumulative_dist
FROM employees;

例4:複雑なランキング

SELECT 
    department,
    name,
    salary,
    years_experience,
    RANK() OVER (
        PARTITION BY department 
        ORDER BY salary DESC, years_experience DESC
    ) AS salary_exp_rank,
    DENSE_RANK() OVER (
        PARTITION BY department 
        ORDER BY salary DESC, years_experience DESC
    ) AS salary_exp_dense_rank
FROM employees;

6. パフォーマンス最適化のヒント

インデックスの活用

-- ORDER BY句で使用する列にインデックスを作成
CREATE INDEX idx_employees_salary ON employees(salary);
CREATE INDEX idx_employees_dept_salary ON employees(department, salary);

適切なPARTITION BYの使用

-- 効率的な分割
SELECT 
    department,
    name,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

-- 非効率的な分割(避けるべき)
SELECT 
    department,
    name,
    salary,
    AVG(salary) OVER (PARTITION BY department, name) AS name_avg
FROM employees;

フレームサイズの制限

-- 効率的なフレーム指定
SELECT 
    date,
    sales,
    AVG(sales) OVER (
        ORDER BY date 
        ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
    ) AS moving_avg_30days
FROM daily_sales;

7. よくあるエラーと対処法

エラー1:ORDER BY句なしでの使用

-- エラー:ORDER BY句が必要
SELECT 
    name,
    LAG(salary) OVER () AS prev_salary
FROM employees;

-- 修正
SELECT 
    name,
    LAG(salary) OVER (ORDER BY id) AS prev_salary
FROM employees;

エラー2:フレーム指定の問題

-- エラー:LAST_VALUEが期待通りに動作しない
SELECT 
    name,
    LAST_VALUE(salary) OVER (ORDER BY salary) AS last_salary
FROM employees;

-- 修正
SELECT 
    name,
    LAST_VALUE(salary) OVER (
        ORDER BY salary 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_salary
FROM employees;

まとめ

PostgreSQLのウインドウ関数は、データ分析において非常に強力なツールです。この記事で紹介した関数を組み合わせることで、複雑な分析要件も効率的に実現できます。

主要なポイント

  1. 順位付け関数: ROW_NUMBER(), RANK(), DENSE_RANK()の使い分け
  2. 集約関数: パーティション内での計算と移動計算
  3. 値関数: LAG(), LEAD()による時系列分析
  4. 統計関数: 相関分析や回帰分析
  5. パフォーマンス: 適切なインデックスとフレーム指定

ウインドウ関数をマスターすることで、より高度なデータ分析が可能になり、ビジネスインサイトの獲得に大きく貢献できるでしょう。

参考資料

1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?