MySQLでグループごとのTop N件取得系問題

MySQL

Last updated at 2020-12-22Posted at 2020-12-09

前置き： MySQL8より古い版を対象としたいので、 window関数使わないでの実装を考えます。

解決したい問題：

クラス毎の成績Top1の生徒情報を取得
クラス毎の成績TopN(N>1)の生徒情報を取得
クラス毎の成績高い順と身長小さい順のTop2の生徒情報を取得

テストテーブルとデータ.sql

/* classNo:クラス番号, name:生徒名, grade:成績, height:身長 
drop table if exists tb;
*/
create table tb
select 1 classNo, 'nm11' name, 11 grade, 31 height
union all select 1, 'nm12', 12, 31
union all select 1, 'nm13', 13, 31
union all select 2, 'nm21', 21, 31
union all select 2, 'nm22', 22, 35
union all select 2, 'nm23', 22, 31
union all select 2, 'nm24', 20, 31
union all select 2, 'nm25', 22, 31
union all select 3, 'nm31', 31, 31
union all select 3, 'nm32', 32, 31
union all select 3, 'nm33', 32, 31
union all select 3, 'nm34', 32, 39
union all select 3, 'nm35', 32, 31

1. クラス毎の成績Top1の生徒情報を取得

方法1_max/minを利用.sql

select tb.*
from tb
  ,(select tb.classNo, max(grade) maxGrade from tb group by classNo) t1
where tb.classNo = t1.classNo
  and t1.maxGrade = tb.grade
/*結果
+---------+------+-------+--------+
| classNo | name | grade | height |
+---------+------+-------+--------+
|       1 | nm13 |    13 |     31 |
|       2 | nm22 |    22 |     35 |
|       2 | nm23 |    22 |     31 |
|       2 | nm25 |    22 |     31 |
|       3 | nm32 |    32 |     31 |
|       3 | nm33 |    32 |     31 |
|       3 | nm34 |    32 |     39 |
|       3 | nm35 |    32 |     31 |
+---------+------+-------+--------+

この 方法1_max/min利用.sqlの問題点としては：

N=1の場合のみ利用可能
TopNとるためのサブ検索 t1とメインデータ取るためのtbと両方が要るので、複数テーブルjoinして検索したりなど、複雑な検索の結果に対してTopN取る場合、利用しづらくなることとパフォーマンス低下することが懸念されます

方法2_orderbyと変数を利用.sql

select classNo, name, grade, height
from (
     select t1.classNo
        ,t1.name
        ,t1.grade
        ,t1.height
        ,(@idxInClass :=
            if(@perClassNo != t1.classNo
               ,1
               ,(@idxInClass + if(@perGrade = t1.grade, 0, 1))
            )
         ) idx
        ,@perClassNo := t1.classNo
        ,@perGrade := t1.grade
     from (
           select tb.*
           from tb
           order by classNo, grade desc
         ) t1
        ,(select @perClassNo := -1, @perGrade := -1, @idxInClass := 0) tmp
     ) tbIdexed
where idx <= 1
/*結果
+---------+------+-------+--------+
| classNo | name | grade | height |
+---------+------+-------+--------+
|       1 | nm13 |    13 |     31 |
|       2 | nm22 |    22 |     35 |
|       2 | nm23 |    22 |     31 |
|       2 | nm25 |    22 |     31 |
|       3 | nm32 |    32 |     31 |
|       3 | nm33 |    32 |     31 |
|       3 | nm34 |    32 |     39 |
|       3 | nm35 |    32 |     31 |
+---------+------+-------+--------+

方法2_orderbyと変数を利用.sqlは、少し雑に見えるんですが、方法1_max/min利用.sqlよりは、拡張性やパフォーマンスがいいと考えられます。

2. クラス毎の成績TopN(N>1)の生徒情報を取得

上記方法2_orderbyと変数を利用.sql があれば、N>1のケースが簡単に解決できます：

select classNo, name, grade, height
from (
     select t1.classNo
        ,t1.name
        ,t1.grade
        ,t1.height
        ,(@idxInClass :=
            if(@perClassNo != t1.classNo
               ,1
               ,(@idxInClass + if(@perGrade = t1.grade, 0, 1))
            )
         ) idx
        ,@perClassNo := t1.classNo
        ,@perGrade := t1.grade
     from (
           select tb.*
           from tb
           order by classNo, grade desc
         ) t1
        ,(select @perClassNo := -1, @perGrade := -1, @idxInClass := 0) tmp
     ) tbIdexed
where idx <= 2 /* ここをNの値に応じて[ where idx <= N ]にするだけ */
/*結果
+---------+------+-------+--------+
| classNo | name | grade | height |
+---------+------+-------+--------+
|       1 | nm13 |    13 |     31 |
|       1 | nm12 |    12 |     31 |
|       2 | nm22 |    22 |     35 |
|       2 | nm23 |    22 |     31 |
|       2 | nm25 |    22 |     31 |
|       2 | nm21 |    21 |     31 |
|       3 | nm32 |    32 |     31 |
|       3 | nm33 |    32 |     31 |
|       3 | nm34 |    32 |     39 |
|       3 | nm35 |    32 |     31 |
|       3 | nm31 |    31 |     31 |
+---------+------+-------+--------+

3. クラス毎の成績高い順と身長小さい順のTop2の生徒情報を取得

上記方法2_orderbyと変数を利用.sqlを少し加工すれば対応可能：

select classNo, name, grade, height
from (
     select t1.classNo
          ,t1.name
          ,t1.grade
          ,t1.height
          ,(@idxInClass :=
              if(@perClassNo != t1.classNo
                 ,1
                 ,(@idxInClass + (@perGrade != t1.grade || @perHeight != t1.height))
              )
          ) idx
          ,@perClassNo := t1.classNo
          ,@perGrade := t1.grade
          ,@perHeight := t1.height
     from (
           select tb.*
           from tb
           order by classNo, grade desc, height
         ) t1
         ,(select @perClassNo := -1, @perGrade := -1, @perHeight := -1, @idxInClass := 0) tmp
 ) tbIdexed
where idx <= 2
/*結果
+---------+------+-------+--------+
| classNo | name | grade | height |
+---------+------+-------+--------+
|       1 | nm13 |    13 |     31 |
|       1 | nm12 |    12 |     31 |
|       2 | nm23 |    22 |     31 |
|       2 | nm25 |    22 |     31 |
|       2 | nm22 |    22 |     35 |
|       3 | nm32 |    32 |     31 |
|       3 | nm33 |    32 |     31 |
|       3 | nm35 |    32 |     31 |
|       3 | nm34 |    32 |     39 |
+---------+------+-------+--------+

使い方を纏め

select col1, col2, ...
from (
    select col1, col2, ...
          ,(@idxInGroup :=
             if( (@preGroupByCol1 != groupByCol1 || @preGroupByCol2 != groupByCol2 || ... )
                 ,1
                 ,(@idxInGroup + (@preCompareCol1 != compareCol1 || @preCompareCol2 != compareCol2 || ... ))
                 )
         ) idxInGroup
        ,@preGroupByCol1 := groupByCol1, @preGroupByCol2 := groupByCol2, ...
        ,@preCompareCol1 := compareCol1, @preCompareCol2 := compareCol2, ...
     from ( /* このスコープに、メインの検索文を記載 */
           select col1, col2, ... from table1, table2, ... where xxxx
           /* ↓ ここの order byがポイントで、グループ分け列、 グループ内比較する列の順で並ぶ必要 */
           order by groupByCol1, groupByCol2, ..., compareCol1, compareCol2, ...
         ) t1
        ,(select @preGroupByCol1 := $initGroupByCol1Value, @preGroupByCol2 := $initGroupByCol2Value, ...
               ,@preCompareCol1 := $initCompareCol1Value, @preCompareCol2 := $initCompareCol2Value, ...
               ,@idxInGroup := 0 /* ここは 0 固定 */
         ) tmp
     ) tbIdexed
where idxInGroup <= $N

項目説明(どんどん見づらくなってきて ...)：

@idxInGroup : それぞれのグループ中の順位。基本触らない
groupByCol1, groupByCol2, ... : グループ分け用列。上記例中のclassNoに該当
@preGroupByCol1, @preGroupByCol2, ... : それぞれの[グループ分け用列]の一つ前の値を記録するための変数。[グループ分け用列]分を定義、初期値設定と動的値設定が必要
compareCol1, compareCol2, ... : グループ内の比較列 。上記例中のgrade、heightに該当(必要に応じてdesc増減)。
@preCompareCol1, @preCompareCol2, ... : それぞれの[グループ内の比較列]の一つ前の値を記録するための変数。[グループ内の比較列]分を定義、初期値設定と動的値設定が必要
$initGroupByCol1Value, $initGroupByCol2Value, ... : 上記@preGroupByCol1, @preGroupByCol2, ...のそれぞれの初期値。初期値はそれぞれの列に存在しない値にすることが理想(order by された結果の一行目中のそれぞれの列の値を避けることが必要最低限)。
$initCompareCol1Value, $initCompareCol2Value, ... : 上記@preCompareCol1, @preCompareCol2, ...のそれぞれの初期値。それぞれの列の値型に沿った値に設定すればいいし、何も考えずに一律nullにしてもいい。

最後

個人的に他にも色んなパターンでテストして、問題なく動くことが確認できたんですが、もし問題点や特にもっと改善すべき点がありましたら、どうぞご指摘とコメントをくださいませ。

==========================
Wanoは積極的にエンジニア採用を行なっています！
まずはオンラインでVPoEとのカジュアル面談から。お好きな入り口からお気軽にお声がけください！
Wano Recruitページ　https://group.wano.co.jp/recruit/
QiitaJobs　https://jobs.qiita.com/employers/wano-inc/postings/1297
Wantedly　https://www.wantedly.com/companies/wano/projects
Findy　https://findy-code.io/companies/522

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up