More than 5 years have passed since last update.

MySQL が range じゃなくて ref を選択するのが解せない

MySQL

Last updated at 2014-08-25Posted at 2014-08-25

mysql 5.6.20

次のようなテーブルがあったとします。

drop table if exists t;
create table t (
  id int not null primary key,
  no int not null,
  str text
);

1000 万件ぐらい適当なデータを突っ込みます。

time seq 10000000 |
  mysql test -e "
    load data local infile '/dev/stdin' into table t (@seq)
    set id = @seq,
        no = floor(rand() * 3),
        str = repeat('x', 100)
  "

インデックスを作ります。

create index idx on t (no, id);

analyze しておきます。

analyze table t;

次のクエリでそこそこ時間がかかります。

select * from t where no = 2 and id <= 9000000 order by no desc, id desc limit 1;
/* 1 row in set (1.15 sec) */

実行計画を見るとセカンダリインデックスで type は ref で key_len は 4 になっています。

explain select * from t where no = 2 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: PRIMARY,idx
          key: idx
      key_len: 4
          ref: const
         rows: 4957699
        Extra: Using where
*/

セカンダリインデックスの no の部分しか使われていないので、おそらく次のように走査されています。

idx の no = 2 の行の右端から降順に走査
id <= 9000000 である行でフィルタ
最初の１行目が見つかったところで走査終了

これだと 30 万行ぐらいはセカンダリインデックスを読み飛ばす必要があります。これならセカンダリインデックスは使わない方がマシです。

select * from t force index (primary) where no = 2 and id <= 9000000 order by no desc, id desc limit 1;
/* 1 row in set (0.00 sec) */

explain select * from t force index (primary) where no = 2 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 4957699
        Extra: Using where
*/

この実行計画ならおそらく次のように走査されています。

primary の id <= 9000000 の範囲の右端から降順に走査
no = 2 である行でフィルタ
最初の１行目が見つかったところで走査終了

がしかし、次のようなケースだとセカンダリインデックスが無いとやばいです。

insert into t value (0, 9, 'xxx');

primary を強制するととても遅いです。

select * from t force index (primary) where no = 9 and id <= 9000000 order by no desc, id desc limit 1;
/* 1 row in set (20.07 sec) */

実行計画は no = 2 のときと変わりないのですが、

explain select * from t force index (primary) where no = 9 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 4957700
        Extra: Using where
*/

次のように走査すると、最初の１行目が見つかるのは 900 万行近く走査した後の１行目になります。

idx の no = 9 である行の右端から降順に走査
id <= 9000000 である行でフィルタ
最初の１行目が見つかったところで走査終了

primary を強制しなければ速いです。

select * from t where no = 9 and id <= 9000000 order by no desc, id desc limit 1;
/* 1 row in set (0.00 sec) */

このクエリの実行計画を見ると range になっています。

explain select * from t where no = 9 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: range
possible_keys: PRIMARY,idx
          key: idx
      key_len: 8
          ref: NULL
         rows: 1
        Extra: Using index condition
*/

おそらく次のように走査されています。

idx の no = 9 かつ id <= 9000000 である行の右端から降順に走査
最初の１行目が見つかったところで走査を停止

no = 9 だと range で no = 2 だと ref になります。

explain select * from t where no = 2 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: PRIMARY,idx
          key: idx
      key_len: 4
          ref: const
         rows: 4957700
        Extra: Using where
*/

explain select * from t where no = 9 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: range
possible_keys: PRIMARY,idx
          key: idx
      key_len: 8
          ref: NULL
         rows: 1
        Extra: Using index condition
*/

整数リテラルを変えただけで実行計画が変わるって・・・MySQL は no = 2 の行は大量にあって no = 9 の行が少ししか無いことを知っているの？？

そもそも no = 2 のときに range ではなく ref になるメリットって何かある？どう転んでも range の方が良いと思うのだけど・・・

ちなみに id の条件を狭くすると no = 2 でも range になります。

select * from t where no = 2 and id <= 90 order by no desc, id desc limit 1;
/* 1 row in set (0.00 sec) */

explain select * from t where no = 2 and id <= 90 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: range
possible_keys: PRIMARY,idx
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 91
        Extra: Using where
*/

no = 2 and id <= 9000000 だと条件に該当する行は 300 万行近くあるため ref でも range でもたいして走査行数は変わらないから ref になる、ということなのかなと思いますが、実行計画を作るにあたって limit 1 が考慮されていないように思えます。

この実験は no のカーディナリティが極端に低い場合にのみ発生します。

最初の 1000 万行作るところで no を適当に分散させれば常に range になります。

time seq 10000000 |
  mysql test -e "
    load data local infile '/dev/stdin' into table t (@seq)
    set id = @seq,
        no = floor(rand() * 30),
        str = repeat('x', 100)
  "

select * from t where no = 2 and id <= 9000000 order by no desc, id desc limit 1;
/* 1 row in set (0.00 sec) */

explain select * from t where no = 2 and id <= 9000000 order by no desc, id desc limit 1 \G
/*
           id: 1
  select_type: SIMPLE
        table: t
         type: range
possible_keys: PRIMARY,idx
          key: idx
      key_len: 8
          ref: NULL
         rows: 490654
        Extra: Using index condition
*/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up