2020/04/27 にリリースされた MySQL 8.0.20 ではハッシュジョインが強化され、
- 等結合の
INNER JOIN
(MySQL 8.0.18 で実装済み)
のほかに、
- 等結合以外の
INNER JOIN
- セミジョイン(
IN
句によるサブクエリなど) - アンチジョイン(
NOT EXISTS
によるサブクエリなど) -
LEFT OUTER JOIN
/RIGHT OUTER JOIN
- MySQL の場合、
RIGHT OUTER JOIN
は表の左右を入れ替えたLEFT OUTER JOIN
として実行される
- MySQL の場合、
で利用できるようになりました。
- 8.2.1.4 Hash Join Optimization(MySQL 8.0 Reference Manual)
そこで、これらのケースで実行計画等がどう変わるのかを確認してみます。
テスト用のテーブル・SQL・データ等
-
テーブルと SQL はこちらのワークログを参考に、明らかに間違っている部分を修正
- WL#2241: Hash join
- WL#13377: Add support for hash outer, anti and semi join
- なおこのテーブルには主キーがないので、
sql_require_primary_key
の設定に注意する
-
データは 10 万行× 10 万行
- 乱数で生成
- 各環境個別に生成したので差異あり
- すべてがバッファプールに載った状態で確認
- 等結合以外・
LEFT OUTER JOIN
/RIGHT OUTER JOIN
では数分以内に結果が返らないケースがあったので、実行時間は 10 万行× 1 万行で計測
- 乱数で生成
-
MySQL 8.0.19 と比較
- 実行計画
- 実行時間(但し参考程度に)
-
テスト環境は以下のスペック
- Alibaba Cloud の東京ゾーン A 内 ECS
- ecs.sn2ne.large(2vCPU・Mem 8GB)
**テーブル定義とデータ**
テーブル定義とデータ
mysql> CREATE DATABASE hashjoin_test;
Query OK, 1 row affected (0.01 sec)
mysql> USE hashjoin_test;
Database changed
mysql> CREATE TABLE t1 (col1 INT);
Query OK, 0 rows affected (0.09 sec)
mysql> CREATE TABLE t2 (col1 INT);
Query OK, 0 rows affected (0.03 sec)
mysql> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
| 100000 |
+----------+
1 row in set (0.01 sec)
mysql> SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
| 100000 |
+----------+
1 row in set (0.02 sec)
mysql> SELECT * FROM t1 LIMIT 10;
+------+
| col1 |
+------+
| 895 |
| 944 |
| 36 |
| 350 |
| 640 |
| 153 |
| 846 |
| 772 |
| 321 |
| 291 |
+------+
10 rows in set (0.00 sec)
mysql> SELECT * FROM t2 LIMIT 10;
+------+
| col1 |
+------+
| 964 |
| 237 |
| 296 |
| 768 |
| 953 |
| 460 |
| 442 |
| 830 |
| 825 |
| 633 |
+------+
10 rows in set (0.00 sec)
※1 万行分のt3
テーブルも同じ構造。
■ SQL
実行計画用
# 等結合以外のINNER JOIN
SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1;
# セミジョイン
SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);
# アンチジョイン
SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);
# LEFT OUTER JOIN
SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1;
# RIGHT OUTER JOIN
SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1;
実行時間計測用
# 等結合以外のINNER JOIN
SELECT COUNT(t3.col1) FROM t3 JOIN t2 ON t2.col1 < t3.col1;
# セミジョイン
SELECT COUNT(t1.col1) FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);
# アンチジョイン
SELECT COUNT(t2.col1) FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);
# LEFT OUTER JOIN
SELECT COUNT(t1.col1) FROM t1 LEFT JOIN t3 ON t1.col1 = t3.col1;
# RIGHT OUTER JOIN
SELECT COUNT(t1.col1) FROM t1 RIGHT JOIN t3 ON t1.col1 = t3.col1;
結果
実行計画
- セミジョイン・アンチジョイン以外はハッシュジョインが効いていることを確認
-
セミジョイン・アンチジョインは 8.0.19 と差がみられず
- ハッシュジョインの表示なし
- 公式リファレンスマニュアルの
EXPLAIN
結果を見ると、間違ってはいない模様- 8.2.1.4 Hash Join Optimization(MySQL 8.0 Reference Manual)※前掲
記録など
**等結合以外の`INNER JOIN`の実行計画(MySQL 8.0.20)**
非等結合実行計画(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.col1 < t2.col1) (cost=1005105800.86 rows=3349983762)
-> Inner hash join (no condition) (cost=1005105800.86 rows=3349983762)
-> Table scan on t2 (cost=0.04 rows=100350)
-> Hash
-> Table scan on t1 (cost=10072.15 rows=100159)
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100159
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100350
filtered: 33.33
Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.01 sec)
**同上(MySQL 8.0.19)**
非等結合実行計画(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 97895
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100425
filtered: 33.33
Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.00 sec)
※8.0.20 で**Block Nested Loop
がhash join
**に変わっているのがわかります。
**セミジョイン(MySQL 8.0.20)**
セミジョイン(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join
-> Filter: (t1.col1 is not null) (cost=10072.15 rows=100159)
-> Table scan on t1 (cost=10072.15 rows=100159)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t1.col1)
-> Materialize with deduplication
-> Filter: (t2.col1 is not null) (cost=10091.25 rows=100350)
-> Table scan on t2 (cost=10091.25 rows=100350)
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100159
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: <subquery2>
partitions: NULL
type: eq_ref
possible_keys: <auto_distinct_key>
key: <auto_distinct_key>
key_len: 5
ref: hashjoin_test.t1.col1
rows: 1
filtered: 100.00
Extra: NULL
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100350
filtered: 100.00
Extra: NULL
3 rows in set, 1 warning (0.00 sec)
**同上(MySQL 8.0.19)**
セミジョイン(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join
-> Filter: (t1.col1 is not null) (cost=9845.75 rows=97895)
-> Table scan on t1 (cost=9845.75 rows=97895)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t1.col1)
-> Materialize with deduplication
-> Filter: (t2.col1 is not null) (cost=10098.75 rows=100425)
-> Table scan on t2 (cost=10098.75 rows=100425)
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 97895
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: <subquery2>
partitions: NULL
type: eq_ref
possible_keys: <auto_distinct_key>
key: <auto_distinct_key>
key_len: 5
ref: hashjoin_test.t1.col1
rows: 1
filtered: 100.00
Extra: NULL
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100425
filtered: 100.00
Extra: NULL
3 rows in set, 1 warning (0.00 sec)
※サンプリングによる推計行数以外の差がわかりません(t2
が実体化される実行計画)。
**アンチジョイン(MySQL 8.0.20)**
アンチジョイン(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop antijoin
-> Table scan on t2 (cost=10091.25 rows=100350)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t2.col1)
-> Materialize with deduplication
-> Filter: (t1.col1 is not null) (cost=10072.15 rows=100159)
-> Table scan on t1 (cost=10072.15 rows=100159)
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100350
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: <subquery2>
partitions: NULL
type: eq_ref
possible_keys: <auto_distinct_key>
key: <auto_distinct_key>
key_len: 5
ref: hashjoin_test.t2.col1
rows: 1
filtered: 100.00
Extra: Using where; Not exists
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100159
filtered: 100.00
Extra: NULL
3 rows in set, 2 warnings (0.00 sec)
**同上(MySQL 8.0.19)**
アンチジョイン(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop anti-join
-> Table scan on t2 (cost=10098.75 rows=100425)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t2.col1)
-> Materialize with deduplication
-> Filter: (t1.col1 is not null) (cost=9845.75 rows=97895)
-> Table scan on t1 (cost=9845.75 rows=97895)
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100425
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: <subquery2>
partitions: NULL
type: eq_ref
possible_keys: <auto_distinct_key>
key: <auto_distinct_key>
key_len: 5
ref: hashjoin_test.t2.col1
rows: 1
filtered: 100.00
Extra: Using where; Not exists
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 97895
filtered: 100.00
Extra: NULL
3 rows in set, 2 warnings (0.00 sec)
※上に同じ。
**`LEFT OUTER JOIN`(MySQL 8.0.20)**
左外部結合(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.col1 = t1.col1) (cost=1005095728.81 rows=10050955650)
-> Table scan on t1 (cost=10072.15 rows=100159)
-> Hash
-> Table scan on t2 (cost=0.10 rows=100350)
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100159
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100350
filtered: 100.00
Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.00 sec)
**同上(MySQL 8.0.19)**
左外部結合(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 97895
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100425
filtered: 100.00
Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.00 sec)
※等結合以外のINNER JOIN
と同様です。
**`RIGHT OUTER JOIN`(MySQL 8.0.20)**
右外部結合(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t1.col1 = t2.col1) (cost=1005095729.02 rows=10050955650)
-> Table scan on t2 (cost=10091.25 rows=100350)
-> Hash
-> Table scan on t1 (cost=0.10 rows=100159)
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100350
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100159
filtered: 100.00
Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.00 sec)
**同上(MySQL 8.0.19)**
左外部結合(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t2
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100425
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 97895
filtered: 100.00
Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.00 sec)
※MySQL 8.0.20 ではLEFT (Hash) JOIN
に変換されているのがわかります。
実行時間
-
セミジョイン・アンチジョインに明確な差は見られず
- 0.6 ~ 0.8 秒程度
- 実行タイミングによって多少のブレあり
- 0.6 ~ 0.8 秒程度
-
等結合以外の
INNER JOIN
では MySQL 8.0.20 がやや高速化- MySQL 8.0.20 : 51 ~ 52 秒程度
- MySQL 8.0.19 : 57 ~ 60 秒程度
-
LEFT OUTER JOIN
/RIGHT OUTER JOIN
では明らかに高速化- MySQL 8.0.20 : 0.1 ~ 0.2 秒程度
- MySQL 8.0.19 : 52 ~ 53 秒程度
記録など
**結果ログの一部抜粋(8.0.20)**
結果ログ一部抜粋(8.0.20)
mysql> SELECT COUNT(t3.col1) FROM t3 JOIN t2 ON t2.col1 < t3.col1;
+----------------+
| COUNT(t3.col1) |
+----------------+
| 499763593 |
+----------------+
1 row in set (51.42 sec)
mysql> SELECT COUNT(t1.col1) FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);
+----------------+
| COUNT(t1.col1) |
+----------------+
| 100000 |
+----------------+
1 row in set (0.53 sec)
mysql> SELECT COUNT(t2.col1) FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);
+----------------+
| COUNT(t2.col1) |
+----------------+
| 0 |
+----------------+
1 row in set (0.53 sec)
mysql> SELECT COUNT(t1.col1) FROM t1 LEFT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
| 998115 |
+----------------+
1 row in set (0.15 sec)
mysql> SELECT COUNT(t1.col1) FROM t1 RIGHT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
| 998115 |
+----------------+
1 row in set (0.14 sec)
**同上(8.0.19)**
結果ログ一部抜粋(8.0.19)
mysql> SELECT COUNT(t3.col1) FROM t3 JOIN t2 ON t2.col1 < t3.col1;
+----------------+
| COUNT(t3.col1) |
+----------------+
| 498956519 |
+----------------+
1 row in set (57.85 sec)
mysql> SELECT COUNT(t1.col1) FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);
+----------------+
| COUNT(t1.col1) |
+----------------+
| 100000 |
+----------------+
1 row in set (0.80 sec)
mysql> SELECT COUNT(t2.col1) FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);
+----------------+
| COUNT(t2.col1) |
+----------------+
| 0 |
+----------------+
1 row in set (0.78 sec)
mysql> SELECT COUNT(t1.col1) FROM t1 LEFT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
| 1000148 |
+----------------+
1 row in set (52.43 sec)
mysql> SELECT COUNT(t1.col1) FROM t1 RIGHT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
| 1000148 |
+----------------+
1 row in set (52.26 sec)
※実際には同じ SQL を何度か実行して時間の平均や外れ値を見ています。
2020/05/12 追記 :
続きの記事を書きました。
- Qiitaに投稿したMySQL 8.0関連記事
- MySQL 8.0 の薄い本(無料で配布中!)