再検証した
この記事だと各々が本気を出せてなかったので、再検証しました!
記録としてこの記事も残しますが、↑の方をご覧ください。
概要
pg_bigmとpg_trgmのPostgreSQL全文検索インデックスの性能比較 という記事を拝見しました。
私はPGroongaというPostgreSQLで高速に日本語の全文検索ができる拡張機能の開発をしているので、↑の記事を参考にPGroongaの実力を確認してみる記事です。
参考: https://pgroonga.github.io/ja/
- pg_bigm
- pg_trgm
- PGroonga
それぞれの性能をざっくり比較します。
厳密に計測したわけではないので、参考程度にご覧ください。
さっそく結論: 測定結果
インデックスの作成時間とサイズ
名前 | create index | index size |
---|---|---|
PGroonga | 7145.405 ms | 約600 MB |
pg_bigm | 13768.143 ms | 139 MB |
pg_trgm | 16320.229 ms | 223 MB |
PGroongaはインデックスの作成がはやいけど、サイズはでかい。
日本語の検索
名前 | 象 | 東京 | 技術者 | ロケット | データベース |
---|---|---|---|---|---|
PGroonga | 36.841 ms | 42.143 ms | 29.019 ms | 31.921 ms | 30.707 ms |
pg_bigm | 10.033 ms | 16.049 ms | 3.594 ms | 6.473 ms | 3.854 ms |
pg_trgm | 235.664 ms | 239.054 ms | 3.679 ms | 6.137 ms | 1.997 ms |
PGroongaも頑張ったけど、めちゃくちゃはやくはなかった…。
アルファベットで検索
名前 | DB | test | PostgreSQL |
---|---|---|---|
PGroonga | 26.913 ms | 28.423 ms | 24.398 ms |
pg_bigm | 5.841 ms | 11.625 ms | 1.768 ms |
pg_trgm | 236.912 ms | 6.506 ms | 1.363 ms |
PGroongaも頑張ったけど、めちゃくちゃはやくはなかった…。
追記
PGroongaにとってちょっと残念な感じだったので、長めのテキスト(= Wikipedia本文)でもざっくり比較しました。
続・PostgreSQL全文検索のざっくり性能比較 - pg_bigmとpg_trgmとPGroonga(Wikipediaの本文を検索する)
長めのテキストだとPGroongaがはやかったので、ぜひこちらもご覧ください!
以降にどのように確認したか掲載します。
環境
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS"
$ free -h
total used free shared buff/cache available
Mem: 62Gi 215Mi 61Gi 139Mi 1.3Gi 62Gi
Swap: 2.0Gi 0B 2.0Gi
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 183
model name : 13th Gen Intel(R) Core(TM) i9-13900
stepping : 1
microcode : 0x12f
cpu MHz : 800.000
cache size : 36864 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 24
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 32
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling usr_wait_pause
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb rfds bhi
bogomips : 3993.60
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
...(略)
準備
PostgreSQLとcontrib(pg_trgm)
sudo apt install -y -V ca-certificates lsb-release wget
sudo wget -O /usr/share/keyrings/pgdg.asc https://www.postgresql.org/media/keys/ACCC4CF8.asc
(echo "Types: deb"; \
echo "URIs: http://apt.postgresql.org/pub/repos/apt"; \
echo "Suites: $(lsb_release --codename --short)-pgdg"; \
echo "Components: main"; \
echo "Signed-By: /usr/share/keyrings/pgdg.asc") | \
sudo tee /etc/apt/sources.list.d/pgdg.sources
sudo apt update
sudo apt install -y -V postgresql-17 postgresql-contrib
pg_bigm
sudo apt install -y -V gcc make postgresql-server-dev-17 wget
wget https://github.com/pgbigm/pg_bigm/archive/refs/tags/v1.2-20240606.tar.gz
tar zxf ./v1.2-20240606.tar.gz
cd pg_bigm-1.2-20240606
make USE_PGXS=1 PG_CONFIG=/usr/bin/pg_config
sudo make USE_PGXS=1 PG_CONFIG=/usr/bin/pg_config install
設定ファイル(/etc/postgresql/17/main/postgresql.conf
)の更新:
shared_preload_libraries = 'pg_bigm'
参考: https://github.com/pgbigm/pg_bigm/blob/07269d7e72dd8bace781887bc94316da70da6a2d/docs/pg_bigm.md
PGroonga
sudo apt install -y -V ca-certificates lsb-release wget
wget https://packages.groonga.org/ubuntu/groonga-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt install -y -V ./groonga-apt-source-latest-$(lsb_release --codename --short).deb
rm -f ./groonga-apt-source-latest-$(lsb_release --codename --short).deb
$ sudo apt update
$ sudo apt install -y -V postgresql-17-pgdg-pgroonga
参考: https://pgroonga.github.io/ja/install/ubuntu.html
PostgreSQL起動
sudo systemctl start postgresql
各拡張機能のバージョン
$ sudo -u postgres psql \
--command 'drop database if exists test' \
--command 'create database test'
DROP DATABASE
CREATE DATABASE
$ sudo -u postgres psql -d test \
--command "create extension if not exists pg_bigm;" \
--command "create extension if not exists pg_trgm;" \
--command "create extension if not exists pgroonga;" \
--command "select * from pg_extension;"
CREATE EXTENSION
CREATE EXTENSION
CREATE EXTENSION
oid | extname | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition
-------+----------+----------+--------------+----------------+------------+-----------+--------------
13649 | plpgsql | 10 | 11 | f | 1.0 | |
16608 | pg_bigm | 10 | 2200 | t | 1.2 | |
16631 | pg_trgm | 10 | 2200 | t | 1.6 | |
16712 | pgroonga | 10 | 2200 | f | 4.0.1 | |
(4 rows)
$ dpkg -l | grep libgroonga
ii libgroonga0:amd64 15.1.1-1 amd64 Library files for Groonga
使うデータのダウンロードと加工
wget https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-all-titles.gz -P /tmp
gunzip /tmp/jawiki-latest-all-titles.gz
sed -i 's/\\/\\\\/g' /tmp/jawiki-latest-all-titles
置換しておかないと COPY
でロードするときにエラーになる。
ざっくり測定
実行したコマンドと結果を記載します。
PGroonga
#!/usr/bin/bash
function run_select() {
query=$1
result=$(sudo -u postgres psql -d test \
--command "set max_parallel_workers_per_gather = 0" \
--command "\timing" \
--command "\pset pager off" \
--command "explain analyze select * from wikipedia_title where page_title &@ '${query}';")
echo "${result}" 1>&2
time=$(echo "${result}" | grep '^Time:')
echo "Query: ${query} ${time}"
}
sudo systemctl restart postgresql
sudo -u postgres psql \
--command 'drop database if exists test' \
--command 'create database test'
sudo -u postgres psql -d test \
--command "create table wikipedia_title (page_namespace int, page_title text);" \
--command "copy wikipedia_title FROM '/tmp/jawiki-latest-all-titles' with header;" \
--command "create extension if not exists pgroonga;" 1>&2
sudo -u postgres psql -d test \
--command "\timing" \
--command "create index wikipedia_title_index on wikipedia_title using pgroonga (page_title);"
run_select "象"
run_select "東京"
run_select "技術者"
run_select "ロケット"
run_select "データベース"
run_select "DB"
run_select "test"
run_select "PostgreSQL"
sudo -u postgres psql -d test --command "select pg_size_pretty(pg_relation_size('wikipedia_title_index'));"
db_oid=$(sudo -u postgres psql -d test \
--no-align \
--tuples-only \
--command "SELECT oid FROM pg_database where datname = 'test';")
ls -lh /var/lib/postgresql/17/main/base/${db_oid}/pgrn*
$ ./run-pgroonga.sh 2> log
DROP DATABASE
CREATE DATABASE
Timing is on.
CREATE INDEX
Time: 7145.405 ms (00:07.145)
Query: 象 Time: 36.841 ms
Query: 東京 Time: 42.143 ms
Query: 技術者 Time: 29.019 ms
Query: ロケット Time: 31.921 ms
Query: データベース Time: 30.707 ms
Query: DB Time: 26.913 ms
Query: test Time: 28.423 ms
Query: PostgreSQL Time: 24.398 ms
pg_size_pretty
----------------
0 bytes
(1 row)
-rw------- 1 postgres postgres 4.0K Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn
-rw------- 1 postgres postgres 17M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000000
-rw------- 1 postgres postgres 17M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000101
-rw------- 1 postgres postgres 8.3M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000102
-rw------- 1 postgres postgres 52K Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000103
-rw------- 1 postgres postgres 4.0K Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000104
-rw------- 1 postgres postgres 4.0K Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000105
-rw------- 1 postgres postgres 169M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000107
-rw------- 1 postgres postgres 25M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000108
-rw------- 1 postgres postgres 229M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.0000109
-rw------- 1 postgres postgres 24M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.000010A
-rw------- 1 postgres postgres 89M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.000010A.c
-rw------- 1 postgres postgres 1.0M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.001
-rw------- 1 postgres postgres 25M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.conf
-rw------- 1 postgres postgres 4.3M Jul 8 02:34 /var/lib/postgresql/17/main/base/1024145/pgrn.options
PGroongaの場合は pg_relation_size()
でインデックスのサイズはわかりません。
PGDATAにあるpgrnで始まるファイルのサイズ = インデックスのサイズと考えてください。
今回の場合はだいたい600MBほどです。
log
特にアルファベットでの検索について、pg_bigmやpg_trgmと扱い方が違っており、ヒット件数に差がありますが、ざっくり性能比較ということでご容赦ください。
CREATE TABLE
COPY 4304153
CREATE EXTENSION
SET
Timing is on.
Pager usage is off.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..44.11 rows=4304 width=36) (actual time=9.346..16.903 rows=2676 loops=1)
Index Cond: (page_title &@ '象'::text)
Planning Time: 18.584 ms
Execution Time: 17.746 ms
(4 rows)
Time: 36.841 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=1.08..37920.04 rows=4304 width=36) (actual time=13.929..23.946 rows=15514 loops=1)
Recheck Cond: (page_title &@ '東京'::text)
Heap Blocks: exact=2548
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..0.00 rows=15516 width=0) (actual time=13.703..13.703 rows=15514 loops=1)
Index Cond: (page_title &@ '東京'::text)
Planning Time: 16.753 ms
Execution Time: 24.996 ms
(7 rows)
Time: 42.143 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..1736.36 rows=4304 width=36) (actual time=9.825..11.748 rows=483 loops=1)
Index Cond: (page_title &@ '技術者'::text)
Planning Time: 15.988 ms
Execution Time: 12.660 ms
(4 rows)
Time: 29.019 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..5169.16 rows=4304 width=36) (actual time=9.011..12.299 rows=1145 loops=1)
Index Cond: (page_title &@ 'ロケット'::text)
Planning Time: 18.395 ms
Execution Time: 13.169 ms
(4 rows)
Time: 31.921 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..1519.81 rows=4304 width=36) (actual time=9.920..11.294 rows=360 loops=1)
Index Cond: (page_title &@ 'データベース'::text)
Planning Time: 18.016 ms
Execution Time: 12.254 ms
(4 rows)
Time: 30.707 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..3344.45 rows=4304 width=36) (actual time=8.834..10.256 rows=843 loops=1)
Index Cond: (page_title &@ 'DB'::text)
Planning Time: 15.490 ms
Execution Time: 11.018 ms
(4 rows)
Time: 26.913 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..4591.66 rows=4304 width=36) (actual time=8.687..10.373 rows=1164 loops=1)
Index Cond: (page_title &@ 'test'::text)
Planning Time: 16.880 ms
Execution Time: 11.261 ms
(4 rows)
Time: 28.423 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using wikipedia_title_index on wikipedia_title (cost=0.00..24.06 rows=4304 width=36) (actual time=7.900..7.932 rows=4 loops=1)
Index Cond: (page_title &@ 'PostgreSQL'::text)
Planning Time: 15.237 ms
Execution Time: 8.729 ms
(4 rows)
Time: 24.398 ms
pg_bigm
#!/usr/bin/bash
function run_select() {
query=$1
result=$(sudo -u postgres psql -d test \
--command "set max_parallel_workers_per_gather = 0" \
--command "\timing" \
--command "\pset pager off" \
--command "explain analyze select * from wikipedia_title where page_title like '%${query}%';")
echo "${result}" 1>&2
time=$(echo "${result}" | grep '^Time:')
echo "Query: ${query} ${time}"
}
sudo systemctl restart postgresql
sudo -u postgres psql \
--command 'drop database if exists test' \
--command 'create database test'
sudo -u postgres psql -d test \
--command "create table wikipedia_title (page_namespace int, page_title text);" \
--command "copy wikipedia_title FROM '/tmp/jawiki-latest-all-titles' with header;" \
--command "create extension if not exists pg_bigm;" 1>&2
sudo -u postgres psql -d test \
--command "\timing" \
--command "create index wikipedia_title_index on wikipedia_title using gin (page_title gin_bigm_ops);"
run_select "象"
run_select "東京"
run_select "技術者"
run_select "ロケット"
run_select "データベース"
run_select "DB"
run_select "test"
run_select "PostgreSQL"
sudo -u postgres psql -d test --command "select pg_size_pretty(pg_relation_size('wikipedia_title_index'));"
$ ./run-pg_bigm.sh 2> log
DROP DATABASE
CREATE DATABASE
Timing is on.
CREATE INDEX
Time: 13768.143 ms (00:13.768)
Query: 象 Time: 10.033 ms
Query: 東京 Time: 16.049 ms
Query: 技術者 Time: 3.594 ms
Query: ロケット Time: 6.473 ms
Query: データベース Time: 3.854 ms
Query: DB Time: 5.841 ms
Query: test Time: 11.625 ms
Query: PostgreSQL Time: 1.768 ms
pg_size_pretty
----------------
139 MB
(1 row)
log
CREATE TABLE
COPY 4304153
CREATE EXTENSION
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=1201.85..34441.92 rows=172166 width=36) (actual time=0.625..9.131 rows=2676 loops=1)
Recheck Cond: (page_title ~~ '%象%'::text)
Heap Blocks: exact=1672
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..1158.81 rows=172166 width=0) (actual time=0.500..0.501 rows=2676 loops=1)
Index Cond: (page_title ~~ '%象%'::text)
Planning Time: 0.324 ms
Execution Time: 9.245 ms
(7 rows)
Time: 10.033 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=28.58..4595.70 rows=1377 width=36) (actual time=1.563..15.012 rows=15514 loops=1)
Recheck Cond: (page_title ~~ '%東京%'::text)
Heap Blocks: exact=2548
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..28.24 rows=1377 width=0) (actual time=1.294..1.294 rows=15514 loops=1)
Index Cond: (page_title ~~ '%東京%'::text)
Planning Time: 0.364 ms
Execution Time: 15.368 ms
(7 rows)
Time: 16.049 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=40.84..1607.56 rows=430 width=36) (actual time=0.209..2.787 rows=483 loops=1)
Recheck Cond: (page_title ~~ '%技術者%'::text)
Heap Blocks: exact=375
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..40.73 rows=430 width=0) (actual time=0.150..0.150 rows=483 loops=1)
Index Cond: (page_title ~~ '%技術者%'::text)
Planning Time: 0.299 ms
Execution Time: 2.893 ms
(7 rows)
Time: 3.594 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=58.07..1624.78 rows=430 width=36) (actual time=1.125..5.720 rows=1145 loops=1)
Recheck Cond: (page_title ~~ '%ロケット%'::text)
Rows Removed by Index Recheck: 1
Heap Blocks: exact=652
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..57.96 rows=430 width=0) (actual time=1.058..1.058 rows=1146 loops=1)
Index Cond: (page_title ~~ '%ロケット%'::text)
Planning Time: 0.335 ms
Execution Time: 5.876 ms
(8 rows)
Time: 6.473 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=92.52..1659.24 rows=430 width=36) (actual time=1.236..3.076 rows=360 loops=1)
Recheck Cond: (page_title ~~ '%データベース%'::text)
Heap Blocks: exact=203
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..92.41 rows=430 width=0) (actual time=1.206..1.207 rows=360 loops=1)
Index Cond: (page_title ~~ '%データベース%'::text)
Planning Time: 0.294 ms
Execution Time: 3.180 ms
(7 rows)
Time: 3.854 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=5840.09..47688.48 rows=860831 width=36) (actual time=0.290..5.050 rows=1689 loops=1)
Recheck Cond: (page_title ~~ '%DB%'::text)
Heap Blocks: exact=483
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..5624.88 rows=860831 width=0) (actual time=0.200..0.200 rows=1689 loops=1)
Index Cond: (page_title ~~ '%DB%'::text)
Planning Time: 0.323 ms
Execution Time: 5.212 ms
(7 rows)
Time: 5.841 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=286.08..33228.70 rows=34433 width=36) (actual time=2.971..11.008 rows=3110 loops=1)
Recheck Cond: (page_title ~~ '%test%'::text)
Rows Removed by Index Recheck: 1215
Heap Blocks: exact=1426
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..277.47 rows=34433 width=0) (actual time=2.815..2.815 rows=4325 loops=1)
Index Cond: (page_title ~~ '%test%'::text)
Planning Time: 0.237 ms
Execution Time: 11.153 ms
(8 rows)
Time: 11.625 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=161.43..1728.15 rows=430 width=36) (actual time=0.951..0.965 rows=4 loops=1)
Recheck Cond: (page_title ~~ '%PostgreSQL%'::text)
Heap Blocks: exact=4
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..161.32 rows=430 width=0) (actual time=0.939..0.939 rows=4 loops=1)
Index Cond: (page_title ~~ '%PostgreSQL%'::text)
Planning Time: 0.316 ms
Execution Time: 1.089 ms
(7 rows)
Time: 1.768 ms
pg_trgm
#!/usr/bin/bash
function run_select() {
query=$1
result=$(sudo -u postgres psql -d test \
--command "set max_parallel_workers_per_gather = 0" \
--command "\timing" \
--command "\pset pager off" \
--command "explain analyze select * from wikipedia_title where page_title like '%${query}%';")
echo "${result}" 1>&2
time=$(echo "${result}" | grep '^Time:')
echo "Query: ${query} ${time}"
}
sudo systemctl restart postgresql
sudo -u postgres psql \
--command 'drop database if exists test' \
--command 'create database test'
sudo -u postgres psql -d test \
--command "create table wikipedia_title (page_namespace int, page_title text);" \
--command "copy wikipedia_title FROM '/tmp/jawiki-latest-all-titles' with header;" \
--command "create extension if not exists pg_trgm;" 1>&2
sudo -u postgres psql -d test \
--command "\timing" \
--command "create index wikipedia_title_index on wikipedia_title using gin (page_title gin_trgm_ops);"
run_select "象"
run_select "東京"
run_select "技術者"
run_select "ロケット"
run_select "データベース"
run_select "DB"
run_select "test"
run_select "PostgreSQL"
sudo -u postgres psql -d test --command "select pg_size_pretty(pg_relation_size('wikipedia_title_index'));"
./run-pg_trgm.sh 2> log
DROP DATABASE
CREATE DATABASE
Timing is on.
CREATE INDEX
Time: 16320.229 ms (00:16.320)
Query: 象 Time: 235.664 ms
Query: 東京 Time: 239.054 ms
Query: 技術者 Time: 3.679 ms
Query: ロケット Time: 6.137 ms
Query: データベース Time: 1.997 ms
Query: DB Time: 236.912 ms
Query: test Time: 6.506 ms
Query: PostgreSQL Time: 1.363 ms
pg_size_pretty
----------------
223 MB
(1 row)
log
CREATE TABLE
COPY 4304153
CREATE EXTENSION
SET
Timing is on.
Pager usage is off.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on wikipedia_title (cost=0.00..84889.91 rows=172166 width=36) (actual time=0.954..234.372 rows=2676 loops=1)
Filter: (page_title ~~ '%象%'::text)
Rows Removed by Filter: 4301477
Planning Time: 0.646 ms
Execution Time: 234.438 ms
(5 rows)
Time: 235.664 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on wikipedia_title (cost=0.00..84889.91 rows=1377 width=36) (actual time=0.047..237.904 rows=15514 loops=1)
Filter: (page_title ~~ '%東京%'::text)
Rows Removed by Filter: 4288639
Planning Time: 0.526 ms
Execution Time: 238.145 ms
(5 rows)
Time: 239.054 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=23.62..1590.34 rows=430 width=36) (actual time=0.122..2.739 rows=483 loops=1)
Recheck Cond: (page_title ~~ '%技術者%'::text)
Heap Blocks: exact=375
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..23.51 rows=430 width=0) (actual time=0.084..0.084 rows=483 loops=1)
Index Cond: (page_title ~~ '%技術者%'::text)
Planning Time: 0.545 ms
Execution Time: 2.832 ms
(7 rows)
Time: 3.679 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=40.86..1607.58 rows=430 width=36) (actual time=0.456..5.103 rows=1145 loops=1)
Recheck Cond: (page_title ~~ '%ロケット%'::text)
Heap Blocks: exact=651
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..40.75 rows=430 width=0) (actual time=0.377..0.377 rows=1145 loops=1)
Index Cond: (page_title ~~ '%ロケット%'::text)
Planning Time: 0.588 ms
Execution Time: 5.229 ms
(7 rows)
Time: 6.137 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=75.33..1642.05 rows=430 width=36) (actual time=0.213..1.332 rows=360 loops=1)
Recheck Cond: (page_title ~~ '%データベース%'::text)
Heap Blocks: exact=203
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..75.23 rows=430 width=0) (actual time=0.191..0.191 rows=360 loops=1)
Index Cond: (page_title ~~ '%データベース%'::text)
Planning Time: 0.407 ms
Execution Time: 1.382 ms
(7 rows)
Time: 1.997 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on wikipedia_title (cost=0.00..84889.91 rows=860831 width=36) (actual time=1.648..236.203 rows=1689 loops=1)
Filter: (page_title ~~ '%DB%'::text)
Rows Removed by Filter: 4302464
Planning Time: 0.394 ms
Execution Time: 236.251 ms
(5 rows)
Time: 236.912 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=268.87..33211.50 rows=34433 width=36) (actual time=0.533..5.830 rows=3110 loops=1)
Recheck Cond: (page_title ~~ '%test%'::text)
Rows Removed by Index Recheck: 634
Heap Blocks: exact=1068
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..260.27 rows=34433 width=0) (actual time=0.449..0.450 rows=3744 loops=1)
Index Cond: (page_title ~~ '%test%'::text)
Planning Time: 0.336 ms
Execution Time: 5.920 ms
(8 rows)
Time: 6.506 ms
SET
Timing is on.
Pager usage is off.
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on wikipedia_title (cost=144.28..1711.00 rows=430 width=36) (actual time=0.374..0.399 rows=4 loops=1)
Recheck Cond: (page_title ~~ '%PostgreSQL%'::text)
Heap Blocks: exact=4
-> Bitmap Index Scan on wikipedia_title_index (cost=0.00..144.18 rows=430 width=0) (actual time=0.357..0.357 rows=4 loops=1)
Index Cond: (page_title ~~ '%PostgreSQL%'::text)
Planning Time: 0.573 ms
Execution Time: 0.472 ms
(7 rows)
Time: 1.363 ms