LoginSignup
4
2

More than 3 years have passed since last update.

MySQL 8.0.20 で強化されたハッシュジョイン(Hash Join)を試してみる

Last updated at Posted at 2020-05-04

2020/04/27 にリリースされた MySQL 8.0.20 ではハッシュジョインが強化され、

  • 等結合のINNER JOIN(MySQL 8.0.18 で実装済み)

のほかに、

  • 等結合以外のINNER JOIN
  • セミジョイン(IN句によるサブクエリなど)
  • アンチジョイン(NOT EXISTSによるサブクエリなど)
  • LEFT OUTER JOIN / RIGHT OUTER JOIN
    • MySQL の場合、RIGHT OUTER JOINは表の左右を入れ替えたLEFT OUTER JOINとして実行される

で利用できるようになりました。

そこで、これらのケースで実行計画等がどう変わるのかを確認してみます。

テスト用のテーブル・SQL・データ等

  • テーブルと SQL はこちらのワークログを参考に、明らかに間違っている部分を修正
  • データは 10 万行× 10 万行
    • 乱数で生成
      • 各環境個別に生成したので差異あり
    • すべてがバッファプールに載った状態で確認
    • 等結合以外・LEFT OUTER JOIN / RIGHT OUTER JOINでは数分以内に結果が返らないケースがあったので、実行時間は 10 万行× 1 万行で計測
  • MySQL 8.0.19 と比較
    • 実行計画
    • 実行時間(但し参考程度に)
  • テスト環境は以下のスペック
    • Alibaba Cloud の東京ゾーン A 内 ECS
      • ecs.sn2ne.large(2vCPU・Mem 8GB)


テーブル定義とデータ
テーブル定義とデータ
mysql> CREATE DATABASE hashjoin_test;
Query OK, 1 row affected (0.01 sec)

mysql> USE hashjoin_test;
Database changed
mysql> CREATE TABLE t1 (col1 INT);
Query OK, 0 rows affected (0.09 sec)

mysql> CREATE TABLE t2 (col1 INT);
Query OK, 0 rows affected (0.03 sec)

mysql> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.01 sec)

mysql> SELECT COUNT(*) FROM t2;
+----------+
| COUNT(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.02 sec)

mysql> SELECT * FROM t1 LIMIT 10;
+------+
| col1 |
+------+
|  895 |
|  944 |
|   36 |
|  350 |
|  640 |
|  153 |
|  846 |
|  772 |
|  321 |
|  291 |
+------+
10 rows in set (0.00 sec)

mysql> SELECT * FROM t2 LIMIT 10;
+------+
| col1 |
+------+
|  964 |
|  237 |
|  296 |
|  768 |
|  953 |
|  460 |
|  442 |
|  830 |
|  825 |
|  633 |
+------+
10 rows in set (0.00 sec)

※1 万行分のt3テーブルも同じ構造。


■ SQL

実行計画用
# 等結合以外のINNER JOIN
SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1;

# セミジョイン
SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);

# アンチジョイン
SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);

# LEFT OUTER JOIN
SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1;

# RIGHT OUTER JOIN
SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1;
実行時間計測用
# 等結合以外のINNER JOIN
SELECT COUNT(t3.col1) FROM t3 JOIN t2 ON t2.col1 < t3.col1;

# セミジョイン
SELECT COUNT(t1.col1) FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);

# アンチジョイン
SELECT COUNT(t2.col1) FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);

# LEFT OUTER JOIN
SELECT COUNT(t1.col1) FROM t1 LEFT JOIN t3 ON t1.col1 = t3.col1;

# RIGHT OUTER JOIN
SELECT COUNT(t1.col1) FROM t1 RIGHT JOIN t3 ON t1.col1 = t3.col1;

結果

実行計画

  • セミジョイン・アンチジョイン以外はハッシュジョインが効いていることを確認
  • セミジョイン・アンチジョインは 8.0.19 と差がみられず
    • ハッシュジョインの表示なし
    • 公式リファレンスマニュアルのEXPLAIN結果を見ると、間違ってはいない模様

記録など


等結合以外のINNER JOINの実行計画(MySQL 8.0.20)
非等結合実行計画(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.col1 < t2.col1)  (cost=1005105800.86 rows=3349983762)
    -> Inner hash join (no condition)  (cost=1005105800.86 rows=3349983762)
        -> Table scan on t2  (cost=0.04 rows=100350)
        -> Hash
            -> Table scan on t1  (cost=10072.15 rows=100159)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100159
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100350
     filtered: 33.33
        Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.01 sec)




同上(MySQL 8.0.19)
非等結合実行計画(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 97895
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100425
     filtered: 33.33
        Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.00 sec)


※8.0.20 でBlock Nested Loophash joinに変わっているのがわかります。


セミジョイン(MySQL 8.0.20)
セミジョイン(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join
    -> Filter: (t1.col1 is not null)  (cost=10072.15 rows=100159)
        -> Table scan on t1  (cost=10072.15 rows=100159)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t1.col1)
        -> Materialize with deduplication
            -> Filter: (t2.col1 is not null)  (cost=10091.25 rows=100350)
                -> Table scan on t2  (cost=10091.25 rows=100350)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100159
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: <subquery2>
   partitions: NULL
         type: eq_ref
possible_keys: <auto_distinct_key>
          key: <auto_distinct_key>
      key_len: 5
          ref: hashjoin_test.t1.col1
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100350
     filtered: 100.00
        Extra: NULL
3 rows in set, 1 warning (0.00 sec)




同上(MySQL 8.0.19)
セミジョイン(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join
    -> Filter: (t1.col1 is not null)  (cost=9845.75 rows=97895)
        -> Table scan on t1  (cost=9845.75 rows=97895)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t1.col1)
        -> Materialize with deduplication
            -> Filter: (t2.col1 is not null)  (cost=10098.75 rows=100425)
                -> Table scan on t2  (cost=10098.75 rows=100425)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 97895
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: <subquery2>
   partitions: NULL
         type: eq_ref
possible_keys: <auto_distinct_key>
          key: <auto_distinct_key>
      key_len: 5
          ref: hashjoin_test.t1.col1
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100425
     filtered: 100.00
        Extra: NULL
3 rows in set, 1 warning (0.00 sec)


※サンプリングによる推計行数以外の差がわかりません(t2が実体化される実行計画)。


アンチジョイン(MySQL 8.0.20)
アンチジョイン(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop antijoin
    -> Table scan on t2  (cost=10091.25 rows=100350)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t2.col1)
        -> Materialize with deduplication
            -> Filter: (t1.col1 is not null)  (cost=10072.15 rows=100159)
                -> Table scan on t1  (cost=10072.15 rows=100159)

1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100350
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: <subquery2>
   partitions: NULL
         type: eq_ref
possible_keys: <auto_distinct_key>
          key: <auto_distinct_key>
      key_len: 5
          ref: hashjoin_test.t2.col1
         rows: 1
     filtered: 100.00
        Extra: Using where; Not exists
*************************** 3. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100159
     filtered: 100.00
        Extra: NULL
3 rows in set, 2 warnings (0.00 sec)




同上(MySQL 8.0.19)
アンチジョイン(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop anti-join
    -> Table scan on t2  (cost=10098.75 rows=100425)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (col1=t2.col1)
        -> Materialize with deduplication
            -> Filter: (t1.col1 is not null)  (cost=9845.75 rows=97895)
                -> Table scan on t1  (cost=9845.75 rows=97895)

1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100425
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: <subquery2>
   partitions: NULL
         type: eq_ref
possible_keys: <auto_distinct_key>
          key: <auto_distinct_key>
      key_len: 5
          ref: hashjoin_test.t2.col1
         rows: 1
     filtered: 100.00
        Extra: Using where; Not exists
*************************** 3. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 97895
     filtered: 100.00
        Extra: NULL
3 rows in set, 2 warnings (0.00 sec)


※上に同じ。


LEFT OUTER JOIN(MySQL 8.0.20)
左外部結合(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t2.col1 = t1.col1)  (cost=1005095728.81 rows=10050955650)
    -> Table scan on t1  (cost=10072.15 rows=100159)
    -> Hash
        -> Table scan on t2  (cost=0.10 rows=100350)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100159
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100350
     filtered: 100.00
        Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.00 sec)




同上(MySQL 8.0.19)
左外部結合(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 97895
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100425
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.00 sec)


※等結合以外のINNER JOINと同様です。


RIGHT OUTER JOIN(MySQL 8.0.20)
右外部結合(8.0.20)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Left hash join (t1.col1 = t2.col1)  (cost=1005095729.02 rows=10050955650)
    -> Table scan on t2  (cost=10091.25 rows=100350)
    -> Hash
        -> Table scan on t1  (cost=0.10 rows=100159)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100350
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100159
     filtered: 100.00
        Extra: Using where; Using join buffer (hash join)
2 rows in set, 1 warning (0.00 sec)




同上(MySQL 8.0.19)
左外部結合(8.0.19)
mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 RIGHT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 100425
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 97895
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)
2 rows in set, 1 warning (0.00 sec)


※MySQL 8.0.20 ではLEFT (Hash) JOINに変換されているのがわかります。

実行時間

  • セミジョイン・アンチジョインに明確な差は見られず
    • 0.6 ~ 0.8 秒程度
      • 実行タイミングによって多少のブレあり
  • 等結合以外のINNER JOINでは MySQL 8.0.20 がやや高速化
    • MySQL 8.0.20 : 51 ~ 52 秒程度
    • MySQL 8.0.19 : 57 ~ 60 秒程度
  • LEFT OUTER JOIN / RIGHT OUTER JOINでは明らかに高速化
    • MySQL 8.0.20 : 0.1 ~ 0.2 秒程度
    • MySQL 8.0.19 : 52 ~ 53 秒程度

記録など


結果ログの一部抜粋(8.0.20)
結果ログ一部抜粋(8.0.20)
mysql> SELECT COUNT(t3.col1) FROM t3 JOIN t2 ON t2.col1 < t3.col1;
+----------------+
| COUNT(t3.col1) |
+----------------+
|      499763593 |
+----------------+
1 row in set (51.42 sec)

mysql> SELECT COUNT(t1.col1) FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);
+----------------+
| COUNT(t1.col1) |
+----------------+
|         100000 |
+----------------+
1 row in set (0.53 sec)

mysql> SELECT COUNT(t2.col1) FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);
+----------------+
| COUNT(t2.col1) |
+----------------+
|              0 |
+----------------+
1 row in set (0.53 sec)

mysql> SELECT COUNT(t1.col1) FROM t1 LEFT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
|         998115 |
+----------------+
1 row in set (0.15 sec)

mysql> SELECT COUNT(t1.col1) FROM t1 RIGHT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
|         998115 |
+----------------+
1 row in set (0.14 sec)




同上(8.0.19)
結果ログ一部抜粋(8.0.19)
mysql> SELECT COUNT(t3.col1) FROM t3 JOIN t2 ON t2.col1 < t3.col1;
+----------------+
| COUNT(t3.col1) |
+----------------+
|      498956519 |
+----------------+
1 row in set (57.85 sec)

mysql> SELECT COUNT(t1.col1) FROM t1 WHERE (t1.col1) IN (SELECT t2.col1 FROM t2);
+----------------+
| COUNT(t1.col1) |
+----------------+
|         100000 |
+----------------+
1 row in set (0.80 sec)

mysql> SELECT COUNT(t2.col1) FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.col1 = t2.col1);
+----------------+
| COUNT(t2.col1) |
+----------------+
|              0 |
+----------------+
1 row in set (0.78 sec)

mysql> SELECT COUNT(t1.col1) FROM t1 LEFT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
|        1000148 |
+----------------+
1 row in set (52.43 sec)

mysql> SELECT COUNT(t1.col1) FROM t1 RIGHT JOIN t3 ON t1.col1 = t3.col1;
+----------------+
| COUNT(t1.col1) |
+----------------+
|        1000148 |
+----------------+
1 row in set (52.26 sec)


※実際には同じ SQL を何度か実行して時間の平均や外れ値を見ています。


2020/05/12 追記 :
続きの記事を書きました。


4
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
2