More than 5 years have passed since last update.

インデックス・ショットガンを現場から補足

Last updated at 2019-02-05Posted at 2019-02-02

1 / 15

第26回中国地方DB勉強会 in 岡山

@masudakz 増田和弘
2019-02-02

名著「SQLアンチパターン」

Bill Karwin　著、和田卓人、和田省二　監訳、児島修　訳
http://www.oreilly.co.jp/books/9784873115894/
2013年1月　初版

12章インデックスショットガン

闇雲インデックス

インデックスを定義しない
インデックスを定義しすぎる
インデックスを活用しないクエリを実行してしまう

ある日のレビュー前DDL

CREATE TABLE c_results (
  c_result_id    bigserial not null PRIMARY KEY
, c_id         varchar(30) not null
, child_id     bigint not null
, study_begin  timestamp not null
, study_end    timestamp not null
, elapsed_time integer not null
, create_on    timestamp not null
, CONSTRAINT child_id_study_begin_c_id_unique UNIQUE (child_id, study_begin, c_id)
) ;

INDEX定義

CREATE INDEX c_results_c_id_idx
  ON c_results USING btree (c_id);

CREATE INDEX c_results_child_id_idx
  ON c_results USING btree (child_id);

CREATE INDEX c_results_study_end_idx
  ON c_results USING btree (study_end);

全部のカラムじゃない
3カラムとも絞込条件に頻出しそう
実際にはもっとカラムがあります。

よさげ？

「うまい手」に見える
3本独立したindexをもっておけば、どう組みあわせたSQLでも、hitsの集合操作で高速
PostgreSQL には Bitmap Index Scan　あるし

複合キーは嫌？

btree(child_id, study_end) だと、この2つのAND条件など、評価順が(child_id, study_end)になるタイプのSQLにしか有効にならない。
c_idも使いたいときは別のindexを追加しないといけない
組みあわせ数の増え方は急速だ！
組みあわせ爆発のおそろしさはみんな知ってる！

サービスの紹介

現実の計算量

子供の数：各学年ざっと100万人＝潜在市場
サービスが軌道にのったら、study_beginの24h範囲で 50万hit, child_idで 2000hit, c_idで 30万hit くらいはすぐ。
PostgreSQL の Bitmap Index Scanではサービス性能要件にちと足りない

類似サービスでの記録

-> Bitmap Index Scan on c_results_study_end_idx 
(actual time=172.871..172.871 rows=1441720 loops=1)
Index Cond: ((study_end >= ‘2018-12-01') AND (study_end < ‘2018-12-02') AND (child_id = xxxxx))
-> Bitmap Index Scan on c_results_child_id_idx 
(actual time=0.163..0.163 rows=1340 loops=1)
Index Cond: (child_id = xxxxx)
Rows Removed by Filter: 1
Total runtime: 180.127 ms

rows=144万!

複数キーindexで分解能を上げる

btree(child_id, study_end) を使って直近resultsをとるなら、index scan 1回
btree(child_id, c_id, study_begin) を使って、IN句で指定した数件のc_id別の直近一覧をとるなら、IN句の指定と同じ回数の index scan。

複数キーindexを使って、それが使えるSQLだけでサービス構成しないと、平均応答時間10-50msecのサービスはできないです。

改修後

CONSTRAINT childid_c_id_study_begin_unique UNIQUE (child_id,c_id,study_begin)

CREATE INDEX c_results_child_id_study_end_idx
  ON c_results USING btree (child_id, study_end));

indexを1本にして、UNIQUE制約の順番も、サービスで使いそうなカラム順序になりました。

MENTORの原則

Measure（測定）
Explain（解析）
Nominate（指名）
Test（テスト）
Optimize（最適化）
Rebuild（再構築）

SQLアンチパターン P.141より

「実際には、すべてのアプリケーションに当てはまる答えはありません。」
「あなたこそが、誰よりもデータと運用要件について知っている人間だからです。」
「最も重要なルールは、推測のみに基づいて、闇雲にインデックスをつけてはならない」

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up