More than 3 years have passed since last update.

MySQLでフィルタリングする際に、WHEREとONの違いはない

Last updated at 2021-10-30Posted at 2021-10-30

前置き

MySQLでJOIN句を使う場合に、別でフィルタ条件を設定する場合に書き方は2通りあります。

①WHERE句を使う

SELECT 
  * 
FROM 
  employee 
  JOIN department ON employee.department_id = department.id 
WHERE 
  employee.name = 'A'

②AND句を使う

SELECT 
  * 
FROM 
  employee 
  JOIN department ON employee.department_id = department.id 
  AND employee.name = 'A'

この2つは同じ結果になります。JOIN句のドキュメントでも以下のように書かれています。

The search_condition used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, the ON clause serves for conditions that specify how to join tables, and the WHERE clause restricts which rows to include in the result set.

JOIN Clauseから引用
https://dev.mysql.com/doc/refman/8.0/en/join.html

一般的にONは結合条件、WHEREはフィルタ条件になると書かれています。句の名前の意図とも合致しているので上の例では①のほうが読みやすい気がします。

ではパフォーマンスも同じなのか？これを実験してみました。

準備

実行環境はMySQL8.0.27

以下の2テーブルを作成

employee
- 10万レコード用意
- nameには10種類の名前A〜Jが1万レコードずつ存在する
- department_idはdepartmentのidと一致する
- PRIMARY_KEY以外のインデックスなし
department
- 10レコード用意
- インデックスなし

結合アルゴリズム

MySQLの結合方法には以前からネストループ結合が使われていますが、MySQL8.0.18からはハッシュ結合も追加されました。どちらが使われているかは実行計画を見ればわかりますが、今回の例ではネストループ結合が使われています。
外部表（駆動表、employee側）が内部表（department側）に対してアクセスするのですが、その際に外部表に対してフィルタリングを行います。それがきちんと行われているかも実行計画で確かめることができます。

Nested-Loop Join
https://dev.mysql.com/doc/refman/8.0/en/nested-loop-joins.html

実行計画の結果

EXPLAIN ANALYZEの結果

MySQL8.0.18から導入されているEXPLAIN ANALYZEを見てみます。これは実行計画と実際の実行を比較することができます。

①WHERE句を使う

-> Nested loop inner join  (cost=13866.86 rows=10189) (actual time=0.058..57.726 rows=10000 loops=1)
    -> Filter: ((employee.`name` = 'A') and (employee.department_id is not null))  (cost=10300.85 rows=10189) (actual time=0.046..50.946 rows=10000 loops=1)
        -> Table scan on employee  (cost=10300.85 rows=101886) (actual time=0.041..34.800 rows=100000 loops=1)
    -> Filter: (employee.department_id = department.id)  (cost=0.25 rows=1) (actual time=0.000..0.000 rows=1 loops=10000)
        -> Single-row index lookup on department using PRIMARY (id=employee.department_id)  (cost=0.25 rows=1) (actual time=0.000..0.000 rows=1 loops=10000)

②AND句を使う

-> Nested loop inner join  (cost=13866.86 rows=10189) (actual time=0.056..54.995 rows=10000 loops=1)
    -> Filter: ((employee.`name` = 'A') and (employee.department_id is not null))  (cost=10300.85 rows=10189) (actual time=0.045..48.593 rows=10000 loops=1)
        -> Table scan on employee  (cost=10300.85 rows=101886) (actual time=0.041..33.346 rows=100000 loops=1)
    -> Filter: (employee.department_id = department.id)  (cost=0.25 rows=1) (actual time=0.000..0.000 rows=1 loops=10000)
        -> Single-row index lookup on department using PRIMARY (id=employee.department_id)  (cost=0.25 rows=1) (actual time=0.000..0.000 rows=1 loops=10000)

③フィルタリング条件なし(全レコード取得）

-> Nested loop inner join  (cost=45960.95 rows=101886) (actual time=0.055..174.963 rows=100000 loops=1)
    -> Filter: (employee.department_id is not null)  (cost=10300.85 rows=101886) (actual time=0.044..43.390 rows=100000 loops=1)
        -> Table scan on employee  (cost=10300.85 rows=101886) (actual time=0.043..33.365 rows=100000 loops=1)
    -> Filter: (employee.department_id = department.id)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=100000)
        -> Single-row index lookup on department using PRIMARY (id=employee.department_id)  (cost=0.25 rows=1) (actual time=0.001..0.001 rows=1 loops=100000)

実行時間比較

実行時間はactual time=0.058..57.726など書かれている箇所で、開始時間..終了時間(ミリ秒)です。何度か実行しましたが①と②では50~60ミリ秒周辺で少しずつばらつきがあるのでほぼ大差ないと見て良さそうです。また外側のJOIN処理でrows=10000となっていることからフィルタリングが行われているように見えます。

また③の条件を外した場合は174ミリ秒と遅くなりrows=100000と全レコード対象となりました。ここからも結合前にフィルタリングが行われていることがはっきりと分かります。

EXPLAIN Statement
https://dev.mysql.com/doc/refman/8.0/en/explain.html

EXPLAINの結果

実際の時間がわかったのであとは補足になりますが、EXPLAINで通常の実行結果を表示できます。

①の結果

②はほとんど同じになるため省略

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employee
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 101886
     filtered: 10.00 ##ここに注目
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: department
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.employee.department_id
         rows: 1
     filtered: 100.00
        Extra: Using where

注目する箇所はemployeeテーブルのfilteredです。この値はフィルタリングされた後のレコード数が何％になるか推定したもので、上限は100になります。フィルタリング条件が無いと100となります。実行計画の段階なのであくまで推定になります。
employee.name = 'A'で一致するのは10万件中1万件なので、filteredの値は10となりました。

この情報をもとにオプティマイザによってテーブルの結合順序が効率良くなるように調整してくれます。ただしこれは内部結合の話で、外部結合の場合は順番が問題になってくるのでもう少し複雑になるようです。

Conditioning Filtering
https://dev.mysql.com/doc/refman/8.0/en/condition-filtering.html

Nested Join Optimization
https://dev.mysql.com/doc/refman/8.0/en/nested-join-optimization.html

まとめ

WHEREでもONでもパフォーマンスは変わらないことがわかりました。完全に同じ挙動ならばどちらでも良いということになりそうです。試しているうちにドキュメントに書いてあることに気づいたのですが…それでも実際に試してみてより納得したのは良かったです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up