PostgreSQLのウインドウ関数完全ガイド:全関数一覧と実例
はじめに
PostgreSQLは、豊富なウインドウ関数を提供しており、データ分析やレポート作成において非常に強力なツールとなっています。この記事では、PostgreSQLで利用可能なすべてのウインドウ関数をカテゴリ別に整理し、詳細な説明と実践的な例を提供します。
ウインドウ関数の基本構文
PostgreSQLのウインドウ関数は以下の構文で使用します:
SELECT
列名,
ウインドウ関数() OVER (
[PARTITION BY 列名 [, 列名...]]
[ORDER BY 列名 [ASC|DESC] [, 列名...]]
[frame_clause]
) AS 別名
FROM テーブル名;
frame_clauseの構文
{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]
frame_start, frame_end: {
UNBOUNDED PRECEDING
| offset PRECEDING
| CURRENT ROW
| offset FOLLOWING
| UNBOUNDED FOLLOWING
}
frame_exclusion: {
EXCLUDE CURRENT ROW
| EXCLUDE GROUP
| EXCLUDE TIES
| EXCLUDE NO OTHERS
}
1. 順位付け関数(Ranking Functions)
ROW_NUMBER()
各行に一意の連番を割り当てます。
SELECT
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;
結果例:
name | salary | row_num
--------|--------|--------
田中 | 500000 | 1
佐藤 | 450000 | 2
鈴木 | 400000 | 3
RANK()
同順位がある場合、同じ順位を割り当て、次の順位をスキップします。
SELECT
name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees;
結果例:
name | salary | rank
--------|--------|-----
田中 | 500000 | 1
佐藤 | 500000 | 1
鈴木 | 400000 | 3
DENSE_RANK()
同順位がある場合、同じ順位を割り当て、次の順位をスキップしません。
SELECT
name,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
結果例:
name | salary | dense_rank
--------|--------|-----------
田中 | 500000 | 1
佐藤 | 500000 | 1
鈴木 | 400000 | 2
PERCENT_RANK()
各行の相対的な順位を0から1の範囲で返します。
SELECT
name,
salary,
PERCENT_RANK() OVER (ORDER BY salary) AS percent_rank
FROM employees;
CUME_DIST()
累積分布を0から1の範囲で返します。
SELECT
name,
salary,
CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;
NTILE(n)
データをn個のグループに分割します。
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
2. 集約関数(Aggregate Functions)
COUNT()
SELECT
department,
name,
salary,
COUNT(*) OVER (PARTITION BY department) AS dept_count
FROM employees;
SUM()
SELECT
department,
name,
salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
SUM(salary) OVER (ORDER BY hire_date) AS cumulative_salary
FROM employees;
AVG()
SELECT
department,
name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
AVG(salary) OVER (
ORDER BY hire_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3months
FROM employees;
MIN() / MAX()
SELECT
department,
name,
salary,
MIN(salary) OVER (PARTITION BY department) AS dept_min,
MAX(salary) OVER (PARTITION BY department) AS dept_max
FROM employees;
STDDEV() / STDDEV_POP() / STDDEV_SAMP()
SELECT
department,
name,
salary,
STDDEV(salary) OVER (PARTITION BY department) AS dept_stddev,
STDDEV_POP(salary) OVER (PARTITION BY department) AS dept_stddev_pop,
STDDEV_SAMP(salary) OVER (PARTITION BY department) AS dept_stddev_samp
FROM employees;
VARIANCE() / VAR_POP() / VAR_SAMP()
SELECT
department,
name,
salary,
VARIANCE(salary) OVER (PARTITION BY department) AS dept_variance,
VAR_POP(salary) OVER (PARTITION BY department) AS dept_var_pop,
VAR_SAMP(salary) OVER (PARTITION BY department) AS dept_var_samp
FROM employees;
3. 値関数(Value Functions)
LAG(expression [, offset [, default]])
前の行の値を取得します。
SELECT
month,
sales,
LAG(sales, 1) OVER (ORDER BY month) AS prev_month_sales,
LAG(sales, 1, 0) OVER (ORDER BY month) AS prev_month_sales_default
FROM monthly_sales;
LEAD(expression [, offset [, default]])
次の行の値を取得します。
SELECT
month,
sales,
LEAD(sales, 1) OVER (ORDER BY month) AS next_month_sales,
LEAD(sales, 1, 0) OVER (ORDER BY month) AS next_month_sales_default
FROM monthly_sales;
FIRST_VALUE(expression)
ウインドウ内の最初の値を取得します。
SELECT
department,
name,
salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_highest_salary
FROM employees;
LAST_VALUE(expression)
ウインドウ内の最後の値を取得します。
SELECT
department,
name,
salary,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_lowest_salary
FROM employees;
NTH_VALUE(expression, n)
ウインドウ内のn番目の値を取得します。
SELECT
department,
name,
salary,
NTH_VALUE(salary, 2) OVER (
PARTITION BY department
ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS dept_second_highest
FROM employees;
4. 統計関数(Statistical Functions)
CORR(expression1, expression2)
相関係数を計算します。
SELECT
department,
CORR(salary, years_experience) OVER (PARTITION BY department) AS salary_exp_corr
FROM employees;
COVAR_POP(expression1, expression2) / COVAR_SAMP(expression1, expression2)
共分散を計算します。
SELECT
department,
COVAR_POP(salary, years_experience) OVER (PARTITION BY department) AS salary_exp_covar_pop,
COVAR_SAMP(salary, years_experience) OVER (PARTITION BY department) AS salary_exp_covar_samp
FROM employees;
REGR_* 関数群
線形回帰に関連する関数群です。
SELECT
department,
REGR_SLOPE(salary, years_experience) OVER (PARTITION BY department) AS slope,
REGR_INTERCEPT(salary, years_experience) OVER (PARTITION BY department) AS intercept,
REGR_R2(salary, years_experience) OVER (PARTITION BY department) AS r_squared
FROM employees;
5. 実践的な使用例
例1:部門別の給与分析
SELECT
department,
name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
ROUND(
(salary - AVG(salary) OVER (PARTITION BY department)) /
STDDEV(salary) OVER (PARTITION BY department), 2
) AS salary_z_score,
PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) AS salary_percentile
FROM employees
ORDER BY department, salary DESC;
例2:時系列データの分析
SELECT
date,
sales,
LAG(sales, 1) OVER (ORDER BY date) AS prev_day_sales,
LAG(sales, 7) OVER (ORDER BY date) AS prev_week_sales,
AVG(sales) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7days,
SUM(sales) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sales
FROM daily_sales;
例3:パーセンタイル分析
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile,
NTILE(10) OVER (ORDER BY salary) AS decile,
PERCENT_RANK() OVER (ORDER BY salary) AS percentile_rank,
CUME_DIST() OVER (ORDER BY salary) AS cumulative_dist
FROM employees;
例4:複雑なランキング
SELECT
department,
name,
salary,
years_experience,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC, years_experience DESC
) AS salary_exp_rank,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC, years_experience DESC
) AS salary_exp_dense_rank
FROM employees;
6. パフォーマンス最適化のヒント
インデックスの活用
-- ORDER BY句で使用する列にインデックスを作成
CREATE INDEX idx_employees_salary ON employees(salary);
CREATE INDEX idx_employees_dept_salary ON employees(department, salary);
適切なPARTITION BYの使用
-- 効率的な分割
SELECT
department,
name,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- 非効率的な分割(避けるべき)
SELECT
department,
name,
salary,
AVG(salary) OVER (PARTITION BY department, name) AS name_avg
FROM employees;
フレームサイズの制限
-- 効率的なフレーム指定
SELECT
date,
sales,
AVG(sales) OVER (
ORDER BY date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
) AS moving_avg_30days
FROM daily_sales;
7. よくあるエラーと対処法
エラー1:ORDER BY句なしでの使用
-- エラー:ORDER BY句が必要
SELECT
name,
LAG(salary) OVER () AS prev_salary
FROM employees;
-- 修正
SELECT
name,
LAG(salary) OVER (ORDER BY id) AS prev_salary
FROM employees;
エラー2:フレーム指定の問題
-- エラー:LAST_VALUEが期待通りに動作しない
SELECT
name,
LAST_VALUE(salary) OVER (ORDER BY salary) AS last_salary
FROM employees;
-- 修正
SELECT
name,
LAST_VALUE(salary) OVER (
ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_salary
FROM employees;
まとめ
PostgreSQLのウインドウ関数は、データ分析において非常に強力なツールです。この記事で紹介した関数を組み合わせることで、複雑な分析要件も効率的に実現できます。
主要なポイント
- 順位付け関数: ROW_NUMBER(), RANK(), DENSE_RANK()の使い分け
- 集約関数: パーティション内での計算と移動計算
- 値関数: LAG(), LEAD()による時系列分析
- 統計関数: 相関分析や回帰分析
- パフォーマンス: 適切なインデックスとフレーム指定
ウインドウ関数をマスターすることで、より高度なデータ分析が可能になり、ビジネスインサイトの獲得に大きく貢献できるでしょう。