More than 5 years have passed since last update.

MySQL 8.0.20 のハッシュジョイン（Hash Join）を INDEX があるテーブルで試してみる

Last updated at 2020-05-12Posted at 2020-05-12

これは、こちらの記事の続編です。

MySQL 8.0.20 で強化されたハッシュジョイン（Hash Join）を試してみる

MySQL 8.0.18 で実装された時点で INDEX があるテーブルでは（普通に使うと）無効だったハッシュジョインですが、MySQL 8.0.20 で変化があったのかを確認してみます。

【参考】

MySQL 8.0.18のHASH JOINを試した（tom__bo’s Blog）

準備

先の記事で使った環境で、各テーブルに INDEX をADDします。

ADD_INDEX

mysql> USE hashjoin_test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> ALTER TABLE t1 ADD INDEX (col1);
Query OK, 0 rows affected (0.46 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t2 ADD INDEX (col1);
Query OK, 0 rows affected (0.51 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE t3 ADD INDEX (col1);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

確認

先に結論を書くと、MySQL 8.0.20 でも INDEX があるテーブルではハッシュジョインは（普通に使うと）無効でした。

確認

mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=859311907.40 rows=2864055110)
    -> Index scan on t1 using col1  (cost=9274.75 rows=92185)
    -> Filter: (t1.col1 < t2.col1)  (cost=0.04 rows=31069)
        -> Index range scan on t2 (re-planned for each iteration)  (cost=0.04 rows=93215)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index
possible_keys: col1
          key: col1
      key_len: 5
          ref: NULL
         rows: 92185
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: col1
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 93215
     filtered: 33.33
        Extra: Range checked for each record (index map: 0x1)
2 rows in set, 1 warning (0.00 sec)

mysql> EXPLAIN FORMAT=tree SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop left join  (cost=964405.10 rows=8516377)
    -> Index scan on t1 using col1  (cost=9274.75 rows=92185)
    -> Index lookup on t2 using col1 (col1=t1.col1)  (cost=1.12 rows=92)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.col1 = t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index
possible_keys: NULL
          key: col1
      key_len: 5
          ref: NULL
         rows: 92185
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ref
possible_keys: col1
          key: col1
      key_len: 5
          ref: hashjoin_test.t1.col1
         rows: 92
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

hash joinではなく、普通のNested loop (left) joinになりました。

先の記事と同じ SQL の実行時間は、

等結合以外のINNER JOINでは 3 分を超えてしまった
- INDEX 無し・ハッシュジョイン有効時は 51 ～ 52 秒程度
LEFT OUTER JOINでは 0.5 秒前後
- INDEX 無し・ハッシュジョイン有効時は 0.1 ～ 0.2 秒程度
RIGHT OUTER JOINでは 0.4 秒前後
- INDEX 無し・ハッシュジョイン有効時は 0.1 ～ 0.2 秒程度

ということで、遅くなってしまいました。

回避策

LEFT OUTER JOIN / RIGHT OUTER JOINはまだしも、等結合以外のINNER JOINは遅くなりすぎ…ということで、回避策を考えてみます。

MySQL 8.0.20 ではオプティマイザヒント（ヒント句）の数も増えたので、これを使ってみます。

オプティマイザヒント「NO_JOIN_INDEX」を使う

mysql> EXPLAIN FORMAT=tree SELECT /*+ NO_JOIN_INDEX(t1) NO_JOIN_INDEX(t2) */ * F
ROM t1 JOIN t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (t1.col1 < t2.col1)  (cost=859311907.40 rows=2864055110)
    -> Inner hash join (no condition)  (cost=859311907.40 rows=2864055110)
        -> Index scan on t2 using col1  (cost=0.04 rows=93215)
        -> Hash
            -> Index scan on t1 using col1  (cost=9274.75 rows=92185)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT /*+ NO_JOIN_INDEX(t1) NO_JOIN_INDEX(t2) */ * FROM t1 JOIN
t2 ON t1.col1 < t2.col1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index
possible_keys: NULL
          key: col1
      key_len: 5
          ref: NULL
         rows: 92185
     filtered: 100.00
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
   partitions: NULL
         type: index
possible_keys: NULL
          key: col1
      key_len: 5
          ref: NULL
         rows: 93215
     filtered: 33.33
        Extra: Using where; Using index; Using join buffer (hash join)
2 rows in set, 1 warning (0.00 sec)

NO_JOIN_INDEXを各テーブルに適用してみたところ、hash joinが復活しました。

実際に等結合以外のINNER JOINの SQL を実行してみたところ、50 秒前後まで短縮されました。

Qiitaに投稿したMySQL 8.0関連記事
MySQL 8.0 の薄い本（無料で配布中！）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up