0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

自戒:誤った FULL OUTER JOIN の使い方(テーブル間差異を求める)

Last updated at Posted at 2022-12-04

ポイント

 SQLでテーブル間の差異を求める際に FULL OUTER JOIN の使用は適切だが、前処理を怠ると想定と異なる結果へ。
 具体的には FULL OUTER JOIN の結合条件が各テーブルにおいて一意かつNOT NULLを満たすよう集約処理を行う。

使用環境

select version();
version
PostgreSQL 14.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15), 64-bit

具体例

 レジの入金記録 l_table と外部証憑 r_table を突き合わせ、日別のレジ違算を求める

レジ入金記録 l_table

deposit_date deposit_type amount
2022-01-01 cash 1000
2022-01-01 cash 1000
2022-01-02 card 3000

外部証憑 r_table

deposit_date deposit_type amount
2022-01-01 cash 500
2022-01-01 cash 1000
2022-01-02 card 2000
2022-01-02 cash 1000

テーブル定義

create table public.l_table(
	deposit_date	date,
	deposit_type	text,
	amount			integer);
	
create table public.r_table(
	deposit_date	date,
	deposit_type	text,
	amount			integer);

データ挿入 INSERT文

insert into public.l_table
values
('2022-01-01', 'cash', 1000),
('2022-01-01', 'cash', 1000),
('2022-01-02', 'card', 3000);

insert into public.r_table
values
('2022-01-01', 'cash', 500),
('2022-01-01', 'cash', 1000),
('2022-01-02', 'card', 2000),
('2022-01-02', 'cash', 1000);

なにも考えずに書いたSELECT文

select
	b.deposit_date,
	b.deposit_type,
	sum(a.amount) as l_amount,
	sum(b.amount) as r_amount,
	sum(a.amount) - sum(b.amount) as diff
from public.l_table as a
full outer join public.r_table as b
using(deposit_date, deposit_type)
group by b.deposit_date, b.deposit_type
order by b.deposit_date, b.deposit_type;

出力された結果セット

deposit_date deposit_type l_amount r_amount diff
2022-01-01 cash 4000 3000 1000
2022-01-02 card 3000 2000 1000
2022-01-02 cash NULL 1000 NULL

期待した結果セット

deposit_date deposit_type l_amount r_amount diff
2022-01-01 cash 2000 1500 500
2022-01-02 card 3000 2000 1000
2022-01-02 cash null 1000 -1000

原因調査

select a.*, b.* from public.l_table as a
full outer join public.r_table as b
using(deposit_date, deposit_type);
deposit_date deposit_type amount deposit_date deposit_type amount
2022-01-01 cash 1000 2022-01-01 cash 500 ※重複
2022-01-01 cash 1000 2022-01-01 cash 1000 ※重複
2022-01-01 cash 1000 2022-01-01 cash 500 ※重複
2022-01-01 cash 1000 2022-01-01 cash 1000 ※重複
2022-01-02 card 3000 2022-01-02 card 2000
2022-01-02 cash 1000

 結合条件が一意ではなかった結果、状態としてクロス結合が生じている。
 ただ、少し考えれば当然のことで FULL OUTER JOIN なのだから1対多と多対1の両方が出力される。

反省して改善したSELECT文

select
	coalesce(a.deposit_date, b.deposit_date),
	coalesce(a.deposit_type, b.deposit_type),
	a.amount as l_amount,
	b.amount as r_amount,
	coalesce(a.deposit_amount, 0) - coalesce(b.deposit_amount, 0) as diff
from (
	select
		deposit_date,
		deposit_type,
		sum(amount) as amount
	from public.l_table
    where deposit_date is not null and deposit_type is not null
	group by deposit_date, deposit_type) as a
full outer join (
	select
		deposit_date,
		deposit_type,
		sum(amount) as amount
	from public.r_table
    where deposit_date is not null and deposit_type is not null
	group by deposit_date, deposit_type) as b
using(deposit_date, deposit_type)
order by coalesce(a.deposit_date, b.deposit_date), coalesce(a.deposit_type, b.deposit_type);

最後に

 実のところ、実務でこの問題にぶつかった時は FULL OUTER JOIN を避けようと力業に出てしまった。
 generate_series関数で発生させた入金日と種別を CROSS JOIN した後、サブクエリで集約した各テーブルを LEFT OUTER JOIN するとう。。。
 もっと良い解法や誤り等ございましたら、どうぞご遠慮なくコメントいただけますと幸いです。
 
 

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?