More than 1 year has passed since last update.

SSC R&D

【備忘録】MySQLでGROUP BYとサブクエリを使った話

MySQL

Last updated at 2023-10-10Posted at 2023-10-10

概要

仕事上で一見難しくなさそうな条件でSQLを書くのに手こずったので、備忘録がわりに記事を書きました。自分がこれを使えば想定通りに取得できるかな?と考えた順で書いています。

やりたいこと

以下のテーブルから、type毎にidが一番大きいレコード(id, type, urlの全項目)だけを取得したい。
別個のレコードには、同じurlが含まれない想定。

sample_table

id	type	url
1	1	http://example.com/11
2	1	http://example.com/12
3	2	http://example.com/21
4	2	http://example.com/22
5	2	http://example.com/23
6	3	http://example.com/31

　　　　　　　　↓

id	type	url
2	1	http://example.com/12
5	2	http://example.com/23
6	3	http://example.com/31

試した・考えたこと

❌ DISTINCTを使う

重複を除いてレコードを取得するので、使えるのではと考えた。
→今回の場合はurlも含めて取得したいが、urlがユニークで全てのレコードが重複してしまうためNG

実行SQL:

select distinct * from sample_table st

実行結果:

id	type	url
1	1	http://example.com/11
2	1	http://example.com/12
3	2	http://example.com/21
4	2	http://example.com/22
5	2	http://example.com/23
6	3	http://example.com/31

以下のようにtypeのみなら重複なしで3レコードのみ取得できる。

実行SQL:

select distinct * from sample_table st

実行結果:

type
1
2
3

❌ GROUP BYを使う

type毎にレコードを取得するので、使えないかと考えた。
→GROUP BYで対象とする列以外は、集計関数を利用しないと取得できないためNG。

実行SQL:

select MAX(st.id) as max_id, st.type from sample_table st 
group by st.type

実行結果:
以下のように、type毎にidが最も大きいレコードのid、typeを取得することはできる。
ただし、id、typeと同じ行のurlが取得できない。

id	type
2	1
5	2
6	3

実行SQL:

select MAX(st.id) as max_id, st.type, url from sample_table st 
group by st.type

実行結果:
urlを取得する対象に含めると、以下のようなエラーが発生する。
要約すると、GROUP BYの対象になっていない列はsql_mode=only_full_group_byではそのまま取得できないらしい。

この記事の説明がわかりやすかった

Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'database.st.url' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

⭕️ GROUP BYを利用したサブクエリと自己結合する

GROUP BYを使って取得したいレコードのidは取得できているので、このSQLをサブクエリにしてsample_tableと結合すればurlも含めた必要なレコードが取得できるのではと考えた
→left joinの結合対象が存在しない場合、nullが入ることだけ気をつければOK

実行SQL:

select * from sample_table st 
left join (
  select MAX(st.id) as max_id, st.type
  from sample_table st 
  group by st.type
) as subquery
on st.id = subquery.max_id

実行結果:
sample_tableにGROUP BYを使うで実行したSQLの結果を結合したテーブルを取得する。
以下のように取得したいレコードのみ、結合したmax_id、typeに値が入っている。

id	type	url	max_id	type
1	1	http://example.com/11	null	null
2	1	http://example.com/12	2	1
3	2	http://example.com/21	null	null
4	2	http://example.com/22	null	null
5	2	http://example.com/23	5	2
6	3	http://example.com/31	6	3

実行SQL:

select st.* from sample_table st 
left join (
  select MAX(st.id) as max_id, st.type
  from sample_table st 
  group by st.type
) as subquery
on st.id = subquery.max_id
where not ISNULL(subquery.max_id)

実行結果:
1つ前のSQLから、以下の2点を修正することで想定していたレコードを取得可能

取得対象をsample_tableの列のみ(結合した列は含まない)にする
max_idがnullでない条件を追加する

id	type	url
2	1	http://example.com/12
5	2	http://example.com/23
6	3	http://example.com/31

補足

⭕️ ウィンドウ関数を利用する

先輩とChatGPTからウィンドウ関数なるものでも同じことができると聞いたので、試してみた。

実行SQL:

select *, max(id) over (partition by type) as max_id from sample_table st

実行結果:

id	type	url	max_id
1	1	http://example.com/11	2
2	1	http://example.com/12	2
3	2	http://example.com/21	5
4	2	http://example.com/22	5
5	2	http://example.com/23	5
6	3	http://example.com/31	6

新規列の値 over (partition by 列名)の形で、指定列の値ごとに新規列の値を計算するらしい。今回の場合は、type毎にidの最大値を計算している。

この辺りの説明は、この記事がわかりやすかった。

あとは、上記の表からidとmax_idが同じ元レコード部分だけ表示すればOK

実行SQL:

select id, type, url from (
  select *, max(id) over (partition by type) as max_id 
  from sample_table st 
) as subquery
where id = subquery.max_id

実行結果:

id	type	url
2	1	http://example.com/12
5	2	http://example.com/23
6	3	http://example.com/31

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up