あいまい検索でもIndexを活用したい！GINインデックスの実力を実験してみた

Posted at 2025-12-07

はじめに

RDBのパフォーマンス改善をするために、Indexを貼っていますよね。ただ、ユーザーのあいまいな検索の場合だとIndexが効かなくなるので、どうしようかと悩むのは一度はある気がします！GIN Indexを張れると効いたので、実際に試して見ないと気が済まないので、今回実験していこうと思います！！

普通のIndexからパフォーマンス、キャパシティの評価を実施してみる。

GIN Indexの前に普段貼っているB-tree Indexがどれくらい効果があるものなのかを見ていきたいと思います！！
Indexがないことで、CPUやメモリにどの程度負荷がかかっているのか？そもそもどれくらいパフォーマンスが低下しているのか？を計測していきます。

評価用のTBLを準備

評価TBLの作成

-- 1. テーブル作成
CREATE TABLE cpu_mem_lab (
    id SERIAL PRIMARY KEY,
    score INT,             -- 計算用（CPU負荷）
    memo TEXT,             -- ソート用（メモリ負荷）
    created_at TIMESTAMP
);

-- 2. 100万件データ投入（ランダムな数値と文字列）
INSERT INTO cpu_mem_lab (score, memo, created_at)
SELECT
    (random() * 10000)::INT,
    md5(random()::text),
    NOW()
FROM generate_series(1, 1000000);

-- 3. 統計情報の更新
ANALYZE cpu_mem_lab;

Indexなしの評価

評価内容の確認

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM cpu_mem_lab WHERE score = 5000;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..16528.33 rows=100 width=49) (actual time=2.083..43.739 rows=91 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=10310
   ->  Parallel Seq Scan on cpu_mem_lab  (cost=0.00..15518.33 rows=42 width=49) (actual time=1.681..33.539 rows=30 loops=3)
         Filter: (score = 5000)
         Rows Removed by Filter: 333303
         Buffers: shared hit=10310
 Planning Time: 0.201 ms
 Execution Time: 43.796 ms
(10 rows)

→実行計画を見ていきます。1行づつ丁寧に見ていこうと思います！

 Gather  (cost=1000.00..16528.33 rows=100 width=49) (actual time=2.083..43.739 rows=91 loops=1)

→GatherとはSQL実行時のまとめ役です。
cost=1000.00..16528.33はPostgresの考える（仕事量）を定義しています。最初の1件目を取得するのに1000.00、作業完了するのに16528.33
rows=100 width=49はPostgresの予測では100件ヒットし、1件あたりのデータサイズは49バイトである。
actual time=2.083..43.739 rows=91 loops=1は実際に実行した結果です。最初の1件を取得するまでに約2.083ms、作業完了までに43.739msかかり、取得件数は91件、作業回数は1回。

Workers Planned: 2
Workers Launched: 2

計画段階で2つのWorker(リソース)を計画し、実際に2つのWorker(リソース)を取得できた事を示します。

Parallel Seq Scan on cpu_mem_lab  (cost=0.00..15518.33 rows=42 width=49) (actual time=1.681..33.539 rows=30 loops=3)
         Filter: (score = 5000)
         Rows Removed by Filter: 333303
         Buffers: shared hit=10310
 Planning Time: 0.201 ms
 Execution Time: 43.796 ms

→Parallel Seq Scan on cpu_mem_labは並列でcpu_mem_labのフルスキャンを実施し、端から端まで全部実行した。
(cost=0.00..15518.33 rows=42 width=49) (actual time=1.681..33.539 rows=30 loops=3)。
計画では1つのWorkerが開始するまでに0.00ms、Workerのタスクが完了するまで15518.33msである。1つのWorkerは42件、1件あたり49バイト。
実際の実行結果は、開始するまでに1.681ms、完了までに33.539ms。1つのWorkerは30件取得し、3つのWorkerで実施した。

CPUとメモリへの影響確認
docker stats pg-labでCPUとメモリへの影響を確認。

→SQL実行前のCPUは0、メモリは187.8MiB。

→SQL実行後のCPUは13.92%、メモリは187.8MiB。
ちなみに、メモリは「データを読み込んでも使い終わったらすぐ捨てているから、メモリに溜まらない」ようです。

では、次にIndexを貼ってどうなるかを見て見ましょう！

Indexあり

CREATE INDEX idx_score ON cpu_mem_lab(score);

評価内容の確認

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM cpu_mem_lab WHERE score = 5000;
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on cpu_mem_lab  (cost=5.20..376.90 rows=100 width=49) (actual time=0.132..0.589 rows=91 loops=1)
   Recheck Cond: (score = 5000)
   Heap Blocks: exact=91
   Buffers: shared hit=91 read=3
   ->  Bitmap Index Scan on idx_score  (cost=0.00..5.17 rows=100 width=0) (actual time=0.105..0.105 rows=91 loops=1)
         Index Cond: (score = 5000)
         Buffers: shared read=3
 Planning:
   Buffers: shared hit=15 read=1
 Planning Time: 1.007 ms
 Execution Time: 0.638 ms
(11 rows)

 Bitmap Heap Scan on cpu_mem_lab  (cost=5.20..376.90 rows=100 width=49) (actual time=0.132..0.589 rows=91 loops=1)
   Recheck Cond: (score = 5000)
   Heap Blocks: exact=91
   Buffers: shared hit=91 read=3

→Bitmap Heap Scan on cpu_mem_labは、ビットマップ（整理されたメモ）を使って、Heap（データ本体）を見にいった。
cost=5.20..376.90 rows=100 width=49は、Postgresの実行予測では開始までに5.20、完了までに376.90、ヒット件数は100、1件あたり49バイト。
actual time=0.132..0.589 rows=91 loops=1は実際にデータ取得開始は0.132ms、作業が完了は0.589msである。
Heap Blocks: exact=91はデータを取り行くのに91個の異なるページにアクセス。
Buffers: shared hit=91 read=3は共有メモリに91ページあり、共有メモリになかったディスクの3ページアクセスし、合計94ページ分走査した。

CPUとメモリへの影響確認

→CPUは0.56%になっていますね。メモリーは206.5MBですが、Indexのキャッシュが効いているので若干増えています。

100万件のデータに対する SELECT * FROM table WHERE score = 5000 の実行結果比較。

比較項目	インデックスなし (Before)	インデックスあり (After)	改善・変化の度合い
作戦名 (Plan)	`Parallel Seq Scan` (人海戦術・全件走査)	`Bitmap Heap Scan` (カタログ＆地図検索)	スマート化力技から効率的な探索へ変更
実行時間 (Actual Time)	43.796 ms	0.638 ms	約 69倍高速化体感速度は一瞬
予測コスト (Total Cost)	`16,528.33` ポイント	`376.90` ポイント	約 1/44 に減少 Postgresの見積もり労力が激減
読むページ数 (Buffers)	10,310 ページ	94 ページ	約 1/110 に圧縮ディスクI/O（一番遅い処理）がほぼ消滅
CPU使用率 (docker stats)	13.92% (実測値)	0.56% (実測値)	約 1/25 に低下サーバー負荷がほぼゼロに
メモリ使用量 (Mem Usage)	187.8 MiB	206.5 MiB	+18.7 MiB 増加インデックスをキャッシュ(暗記)した証拠

CPUの負荷がかかりやすいポイント！

CPUは『判断』と『計算』をする場所なので、その回数や難易度が上がった時に負荷が高くなります。代表的なケースは以下3点です。

単純に「量」が多い時（塵も積もれば山となる）
→これはイメージしやすいですね。単純に量が増えたから負荷も高くなるということです。

2.「複雑な計算」をさせた時
→WHERE text LIKE '%あいうえお%'やWHERE md5(text) = '...'などの複雑な計算実施時。

3.「並び替え・集計」をした時
→ソート（ORDER BY）やグループ化（GROUP BY）は、データを比較しまくる必要があります。
さて、ようやく今回のテーマであるあいまい検索に触れていこうと思います！

あいまい検索(Like検索)のIndexを貼らない場合

普段よく使う前方一致、後方一致、部分一致の3パターンで検証していきます！

テーブルの準備

-- 1. 拡張機能の有効化（あとで使います）
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- 2. 既存テーブルをリセット
DROP TABLE IF EXISTS cpu_mem_lab;

-- 3. テーブル作成
CREATE TABLE cpu_mem_lab (
    id SERIAL PRIMARY KEY,
    memo TEXT,             -- 実験用：32文字のランダム文字列が入ります
    created_at TIMESTAMP
);

-- 4. 100万件データ投入（数秒〜数十秒かかります）
-- md5(random()::text) で、"8ad7..." のようなランダムな文字列を作ります
INSERT INTO cpu_mem_lab (memo, created_at)
SELECT
    md5(random()::text),
    NOW()
FROM generate_series(1, 1000000);

-- 5. 統計情報の更新
ANALYZE cpu_mem_lab;

Indexなしの検索

前方一致検索の場合。

前方一致での検索

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE 'abc%';
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=15554.65..15554.66 rows=1 width=8) (actual time=48.956..52.207 rows=1 loops=1)
   Buffers: shared hit=9346
   ->  Gather  (cost=15554.44..15554.65 rows=2 width=8) (actual time=48.821..52.201 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=9346
         ->  Partial Aggregate  (cost=14554.44..14554.45 rows=1 width=8) (actual time=40.905..40.906 rows=1 loops=3)
               Buffers: shared hit=9346
               ->  Parallel Seq Scan on cpu_mem_lab  (cost=0.00..14554.33 rows=42 width=0) (actual time=1.635..40.864 rows=89 loops=3)
                     Filter: (memo ~~ 'abc%'::text)
                     Rows Removed by Filter: 333245
                     Buffers: shared hit=9346
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.152 ms
 Execution Time: 52.242 ms
(16 rows)

2.後方一致検索の場合

SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE '%abc';
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=15554.65..15554.66 rows=1 width=8) (actual time=72.515..74.749 rows=1 loops=1)
   Buffers: shared hit=9346
   ->  Gather  (cost=15554.44..15554.65 rows=2 width=8) (actual time=72.418..74.744 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=9346
         ->  Partial Aggregate  (cost=14554.44..14554.45 rows=1 width=8) (actual time=65.847..65.848 rows=1 loops=3)
               Buffers: shared hit=9346
               ->  Parallel Seq Scan on cpu_mem_lab  (cost=0.00..14554.33 rows=42 width=0) (actual time=1.186..65.812 rows=88 loops=3)
                     Filter: (memo ~~ '%abc'::text)
                     Rows Removed by Filter: 333245
                     Buffers: shared hit=9346
 Planning Time: 0.311 ms
 Execution Time: 74.792 ms
(14 rows)

3.部分一致検索

Indexなしの場合は下記のような結果になりました。Indexを貼っていないので全て全件走査になる。

検索タイプ	SQL条件	ヒット件数 (Rows) ※概算	実行時間 (Actual Time)	スキャン方式	読み込みページ数 (Buffers)	CPU負荷 (実測値)
前方一致	`LIKE 'abc%'`	267 件	52.242 ms	Parallel Seq Scan (全件走査)	9,346	中 (約15%)
後方一致	`LIKE '%abc'`	264 件	74.792 ms	Parallel Seq Scan (全件走査)	9,346	高 (約22%)
部分一致	`LIKE '%abc%'`	7,479 件	76.813 ms	Parallel Seq Scan (全件走査)	9,346	高 (約22%)

※ヒット件数の算出根拠:
ログの Parallel Seq Scan における rows (1作業員あたりの発見数) × loops (作業人数 3人) で算出。

前方一致: 89 × 3 = 267

後方一致: 88 × 3 = 264

部分一致: 2493 × 3 = 7,479

「前方一致」と「それ以外」のCPU負荷の差は？

前方一致 (52ms / CPU 15%)
- - 文字列の先頭が不一致なら即座に「違う」と判断して次へ行けるため、計算コストが比較的低く済みました。
後方・部分一致 (約76ms / CPU 22%)
- 文字列の最後まで（あるいは中身すべてを） スキャンしないと合致判定ができないため、CPUリソースをより多く消費し、時間も約1.5倍かかりました。
- 特に部分一致はヒット件数が多いため、その分だけ処理コストも微増しています。

B-Treeインデックスを作成し、評価する

B-Treeインデックス

-- 通常のインデックス（B-Tree）を作成
-- ※ text_pattern_ops は、LIKE 'abc%' を高速化するための必須オプションです
CREATE INDEX idx_memo_btree ON cpu_mem_lab(memo text_pattern_ops);

前方一致検索

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE 'abc%';
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4.70..4.71 rows=1 width=8) (actual time=0.699..0.700 rows=1 loops=1)
   Buffers: shared hit=6
   ->  Index Only Scan using idx_memo_btree on cpu_mem_lab  (cost=0.42..4.45 rows=100 width=0) (actual time=0.076..0.226 rows=266 loops=1)
         Index Cond: ((memo ~>=~ 'abc'::text) AND (memo ~<~ 'abd'::text))
         Filter: (memo ~~ 'abc%'::text)
         Heap Fetches: 0
         Buffers: shared hit=6
 Planning Time: 0.324 ms
 Execution Time: 0.766 ms
(9 rows)

2.後方一致検索

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE '%abc';
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=15554.65..15554.66 rows=1 width=8) (actual time=73.874..76.265 rows=1 loops=1)
   Buffers: shared hit=9346
   ->  Gather  (cost=15554.44..15554.65 rows=2 width=8) (actual time=73.790..76.259 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=9346
         ->  Partial Aggregate  (cost=14554.44..14554.45 rows=1 width=8) (actual time=65.118..65.118 rows=1 loops=3)
               Buffers: shared hit=9346
               ->  Parallel Seq Scan on cpu_mem_lab  (cost=0.00..14554.33 rows=42 width=0) (actual time=1.871..65.083 rows=88 loops=3)
                     Filter: (memo ~~ '%abc'::text)
                     Rows Removed by Filter: 333245
                     Buffers: shared hit=9346
 Planning Time: 0.243 ms
 Execution Time: 76.313 ms
(14 rows)

3.部分一致検索

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE '%abc%';

                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=15554.65..15554.66 rows=1 width=8) (actual time=61.081..63.054 rows=1 loops=1)
   Buffers: shared hit=9346
   ->  Gather  (cost=15554.44..15554.65 rows=2 width=8) (actual time=61.009..63.049 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=9346
         ->  Partial Aggregate  (cost=14554.44..14554.45 rows=1 width=8) (actual time=55.614..55.614 rows=1 loops=3)
               Buffers: shared hit=9346
               ->  Parallel Seq Scan on cpu_mem_lab  (cost=0.00..14554.33 rows=42 width=0) (actual time=0.114..55.375 rows=2493 loops=3)
                     Filter: (memo ~~ '%abc%'::text)
                     Rows Removed by Filter: 330840
                     Buffers: shared hit=9346
 Planning Time: 0.339 ms
 Execution Time: 63.102 ms
(14 rows)

Index有無の実行結果の比較

測定結果：前方一致の「圧勝」と他2つの「敗北」

検索タイプ	SQL条件	実行時間 (Actual Time)	スキャン方式	読み込みページ数 (Buffers)	判定
1. 前方一致	`LIKE 'abc%'`	0.766 ms	Index Only Scan (インデックスのみ)	6	大勝利 (約 70倍高速化)
2. 後方一致	`LIKE '%abc'`	76.313 ms	Parallel Seq Scan (全件走査)	9,346	効果なし (インデックス無視)
3. 部分一致	`LIKE '%abc%'`	63.102 ms	Parallel Seq Scan (全件走査)	9,346	効果なし (インデックス無視)

1. 前方一致の勝因 (`abc%`)

挙動: Index Only Scan が選択されました。テーブル本体（Heap）へのアクセスすら発生せず、インデックスだけで回答が完了しています（Heap Fetches: 0）。
仕組み: PostgreSQLは検索条件を内部的に 「abc 以上、かつ abd 未満の範囲検索」 に変換しました。
- ログ: Index Cond: ((memo ~>=~ 'abc') AND (memo ~<~ 'abd'))
リソース: 読み込みが 6ページ（数KB）で済んだため、CPU負荷も発生せず一瞬で完了しました。

2. 後方・部分一致の敗因 (`%abc`, `%abc%`)

挙動: インデックスが存在するにもかかわらず、オプティマイザは 「使えない」と判断して全件走査 (Seq Scan) を選択しました。
無駄: 100万件すべてのデータを読み込み、フィルタリングした結果、99%以上のデータを捨てています。
- ログ: Rows Removed by Filter: 333,245 (×3並列)
リソース: 9,346ページ（約75MB）を読み込み、文字列比較のためにCPUリソースを浪費しました。

GINインデックスをはる。

では、B-treeIndexでは後方一致や部分一致ではIndexが効きませんでした。GINインデックスを貼って早くしようと思います！

GINインデックスを貼る

index_lab=# CREATE INDEX idx_memo_gin ON cpu_mem_lab USING GIN (memo gin_trgm_ops);
CREATE INDEX

1.前方一致

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE 'abc%';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=446.95..446.96 rows=1 width=8) (actual time=1.927..1.928 rows=1 loops=1)
   Buffers: shared hit=319
   ->  Bitmap Heap Scan on cpu_mem_lab  (cost=76.48..446.70 rows=100 width=0) (actual time=1.433..1.901 rows=266 loops=1)
         Recheck Cond: (memo ~~ 'abc%'::text)
         Rows Removed by Index Recheck: 22
         Heap Blocks: exact=284
         Buffers: shared hit=319
         ->  Bitmap Index Scan on idx_memo_gin  (cost=0.00..76.45 rows=100 width=0) (actual time=1.373..1.374 rows=288 loops=1)
               Index Cond: (memo ~~ 'abc%'::text)
               Buffers: shared hit=35
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.145 ms
 Execution Time: 1.970 ms
(14 rows)

2.後方一致

index_lab=# EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE '%abc';
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=421.63..421.64 rows=1 width=8) (actual time=2.401..2.402 rows=1 loops=1)
   Buffers: shared hit=308
   ->  Bitmap Heap Scan on cpu_mem_lab  (cost=51.16..421.38 rows=100 width=0) (actual time=1.302..2.358 rows=265 loops=1)
         Recheck Cond: (memo ~~ '%abc'::text)
         Rows Removed by Index Recheck: 34
         Heap Blocks: exact=296
         Buffers: shared hit=308
         ->  Bitmap Index Scan on idx_memo_gin  (cost=0.00..51.13 rows=100 width=0) (actual time=1.206..1.206 rows=299 loops=1)
               Index Cond: (memo ~~ '%abc'::text)
               Buffers: shared hit=12
 Planning:
   Buffers: shared hit=1
 Planning Time: 1.417 ms
 Execution Time: 2.491 ms
(14 rows)

3.部分一致

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM cpu_mem_lab WHERE memo LIKE '%abc%';
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=396.31..396.32 rows=1 width=8) (actual time=20.858..20.860 rows=1 loops=1)
   Buffers: shared hit=5159
   ->  Bitmap Heap Scan on cpu_mem_lab  (cost=25.84..396.06 rows=100 width=0) (actual time=3.563..20.126 rows=7479 loops=1)
         Recheck Cond: (memo ~~ '%abc%'::text)
         Heap Blocks: exact=5152
         Buffers: shared hit=5159
         ->  Bitmap Index Scan on idx_memo_gin  (cost=0.00..25.82 rows=100 width=0) (actual time=1.712..1.712 rows=7479 loops=1)
               Index Cond: (memo ~~ '%abc%'::text)
               Buffers: shared hit=7
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.306 ms
 Execution Time: 20.991 ms
(13 rows)

1. 実行時間の比較結果 (Actual Time)

検索タイプ	SQL条件	① インデックスなし	② B-Tree (通常)	③ GIN (n-gram)	判定・解説
前方一致	LIKE 'abc%'	52.2 ms	`0.76 ms`	`1.97 ms`	B-Treeが最速 GINも十分高速だが、構造がシンプルなB-Treeには僅かに及ばない。
後方一致	LIKE '%abc'	74.8 ms	76.3 ms (効果なし)	`2.49 ms`	GINの圧勝全件走査と比較して約30倍の高速化。
部分一致	LIKE '%abc%'	76.8 ms	63.1 ms (効果なし)	`20.9 ms`	GINの勝利全件走査の約4倍高速。 ※ヒット件数が多いためデータ回収に時間はかかるが、CPU負荷は低い。

GINインデックスでも十分処理速度が早くなっていますね！！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up