LoginSignup
1
2

More than 5 years have passed since last update.

PostgreSQL で偏差値を計算する

Last updated at Posted at 2018-09-18

PostgreSQL のユーザ定義集約関数の練習に、集約後のグループごとに偏差値計算してみた。

  • サンプルサイズが1しかない場合は分散が0で偏差値出せないのでNULLにしといた。
  • 一様分布の場合は分散が0で偏差値出せないけど、こっちは50にしといた。
  • variance で返ってくるのは不偏分散みたいなので、 stddev も不偏標準偏差になってるので検算するときとかは注意。
    https://www.postgresql.jp/document/9.6/html/functions-aggregate.html

サンプルデータ100件

要件

id_N で group by してグループごとに偏差値を算出してみる。

sample.tsv
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_5    1
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_3    14
id_3    1
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_1    3
id_7    2
id_2    3
id_2    1
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7
id_6    3
id_6    2
id_6    4
id_7    2
id_6    7
id_6    3
id_6    2
id_6    4
id_4    3
id_6    7
id_6    3
id_6    2
id_6    4
id_6    7

投入

create table sample(
    id varchar,
    val integer
);

\copy sample from /tmp/sample.tsv (DELIMITER E'\t', FORMAT CSV, HEADER true);

実行

create or replace function _sigma (numeric[]) returns numeric
    language sql as $$
select stddev(v) from (select unnest($1) v) a;
$$;

create aggregate sigma(numeric)
(
  sfunc     = array_append
 , stype     = numeric[]
 , finalfunc = _sigma
);

create or replace function _mean (numeric[]) returns numeric
    language sql as $$
select avg(v) from (select unnest($1) v) a;
$$;

create aggregate mean(numeric)
(
  sfunc     = array_append
 , stype     = numeric[]
 , finalfunc = _mean
);


with d as (
select
  id,
  count(*) size,
  sigma(val) sigma,
  mean(val) mean
from sample
group by id
)

select
  *
  , CASE sigma
    WHEN 0 THEN 50
    ELSE 50 + 10 * (val - mean) / sigma
  END as 偏差値
from sample as a
join d
  on a.id = d.id
order by a.id
;

参考

https://www.postgresql.jp/document/9.6/html/sql-createaggregate.html
http://d.hatena.ne.jp/EulerDijkstra/20130508/1368003441

1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2