Groongaでの全文検索コマンドの使い方(groongaコマンド編)
前回の記事で構築したGroongaで全文検索を行うコマンドを試してみます。
このドキュメントは、公式のチュートリアルを参考にしています。
また、HTTPサーバー経由での全文検索は別記事を参照してください。
準備
全文検索に必要なテーブルを準備します。
まずはデータを格納するためのテーブルを作成します。
# test.grnにログイン
$ groonga /tmp/test.grn
# MySQL互換のノーマライザを使えるようにする
> register normalizers/mysql
# とりあえずtrack_id(UInt32)をKeyとしたテーブルを作る
> table_create SearchTrack TABLE_PAT_KEY UInt32 --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI
# テーブルが作られた事を確認
> table_list
[
[0,1410341854.51164,9.44137573242188e-05],
[
#中略
[257,"SearchTrack","/tmp/test.grn.0000101","TABLE_PAT_KEY|PERSISTENT","UInt32",null,"TokenBigram","NormalizerMySQLGeneralCI"]
]
]
# カラム名を追加する
> column_create SearchTrack name --type ShortText
> column_create SearchTrack name_kana --type ShortText
> column_create SearchTrack name_en --type ShortText
> column_create SearchTrack artist_name --type ShortText
> column_create SearchTrack artist_name_kana --type ShortText
> column_create SearchTrack artist_name_en --type ShortText
> column_create SearchTrack album_name --type ShortText
> column_create SearchTrack album_name_kana --type ShortText
> column_create SearchTrack album_name_en --type ShortText
# 作ったテーブルを確認
> column_list SearchTrack
[
[0,1410342025.90874,0.000189781188964844],
[
#中略
[257,"_key","","","COLUMN_SCALAR","SearchTrack","UInt32",[]],
[266,"album_name","/tmp/test.grn.000010A","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[268,"album_name_en","/tmp/test.grn.000010C","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[267,"album_name_kana","/tmp/test.grn.000010B","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[263,"artist_name","/tmp/test.grn.0000107","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[265,"artist_name_en","/tmp/test.grn.0000109","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[264,"artist_name_kana","/tmp/test.grn.0000108","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[261,"name","/tmp/test.grn.0000105","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[262,"name_en","/tmp/test.grn.0000106","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
[256,"name_kana","/tmp/test.grn.0000100","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]]
]
]
続いて、全文検索に用いる語彙表と転置インデックスを作成します。
なお、語彙表と転置インデックスについては、こちらを参照してください。
# 全文検索用の語彙表の作成
> table_create SearchTrackTerms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI
# 全文検索用のインデックスを各カラムの数だけ作成する
> column_create SearchTrackTerms name --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source name
> column_create SearchTrackTerms name_kana --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source name_kana
> column_create SearchTrackTerms name_en --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source name_en
> column_create SearchTrackTerms artist_name --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source artist_name
> column_create SearchTrackTerms artist_name_kana --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source artist_name_kana
> column_create SearchTrackTerms artist_name_en --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source artist_name_en
> column_create SearchTrackTerms album_name --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source album_name
> column_create SearchTrackTerms album_name_kana --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source album_name_kana
> column_create SearchTrackTerms album_name_en --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source album_name_en
# 作った語彙表を確認
> column_list SearchTrackTerms
[
[0,1410343882.38171,0.000171184539794922],
[
[
# 中略
],
[271,"_key","","","COLUMN_SCALAR","SearchTrackTerms","ShortText",[]],
[278,"album_name","/tmp/test.grn.0000116","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.album_name"]],
[280,"album_name_en","/tmp/test.grn.0000118","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.album_name_en"]],
[279,"album_name_kana","/tmp/test.grn.0000117","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.album_name_kana"]],
[275,"artist_name","/tmp/test.grn.0000113","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.artist_name"]],
[277,"artist_name_en","/tmp/test.grn.0000115","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.artist_name_en"]],
[276,"artist_name_kana","/tmp/test.grn.0000114","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.artist_name_kana"]],
[272,"name","/tmp/test.grn.0000110","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.name"]],
[274,"name_en","/tmp/test.grn.0000112","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.name_en"]],
[273,"name_kana","/tmp/test.grn.0000111","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.name_kana"]]
]
]
余談ですが、チュートリアルでは、table_create
の際にKEY_NORMALIZE
を指定していますが、リファレンスによると非推奨との事なので、上記のように--nomalizer
で指定した方がいいと思います。
最後にテストデータを投入します。
> load --table SearchTrack
[
{
"_key": 1,
"name": "曲名1",
"name_kana": "きょくめいいち",
"name_en": "kyokumei ichi",
"artist_name": "音楽家1",
"artist_name_kana": "おんがくかいち",
"artist_name_en": "ongakuka ichi",
"album_name": "原盤1",
"album_name_kana": "げんばんいち",
"album_name_en": "genban ichi"
},
{
"_key": 2,
"name": "曲名2",
"name_kana": "きょくめいに",
"name_en": "kyokumei ni",
"artist_name": "音楽家1",
"artist_name_kana": "おんがくかいち",
"artist_name_en": "ongakuka ichi",
"album_name": "原盤1",
"album_name_kana": "げんばんいち",
"album_name_en": "genban ichi"
},
{
"_key": 3,
"name": "曲名3",
"name_kana": "きょくめいさん",
"name_en": "kyokumei san",
"artist_name": "音楽家2",
"artist_name_kana": "おんがくかに",
"artist_name_en": "ongakuka ni",
"album_name": "原盤2",
"album_name_kana": "げんばんに",
"album_name_en": "genban ni"
}
]
ちなみに、同じ_key
を持つデータを入れようとするとUPDATE扱いになります。
なお、このコマンドのように1件ずつデータを投入していくのは大変なので、一気に投入できる方法を別途検討中です。
コマンドで実行する
HTTPサーバを介さずに、groongaコマンドで実施するやり方。
# 通常の検索(artist_nameが「音楽家1」と一致するレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --query 'artist_name:"音楽家1"' --output_columns '_key, name, album_name, artist_name'
[
[0,1410360576.77508,0.00126838684082031],
[
[
[2], # 結果件数
[
["_key","UInt32"],
["name","ShortText"],
["album_name","ShortText"],
["artist_name","ShortText"]
],
[1,"曲名1","原盤1","音楽家1"],
[2,"曲名2","原盤1","音楽家1"]
]
]
]
# 全文検索(artist_nameが「家1」を含むレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --match_columns artist_name --query '家1' --output_columns '_key, name, album_name, artist_name'
[
[0,1410361014.40699,0.000328302383422852],
[
[
[2],
[
["_key","UInt32"],
["name","ShortText"],
["album_name","ShortText"],
["artist_name","ShortText"]
],
[1,"曲名1","原盤1","音楽家1"],
[2,"曲名2","原盤1","音楽家1"]
]
]
]
# 全文検索(album_nameが「原」と「2」を含むレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --match_columns album_name --query '原盤 2' --output_columns '_key, name, album_name, artist_name'
[
[0,1410361336.0281,0.000438690185546875],
[
[
[1],
[
["_key","UInt32"],
["name","ShortText"],
["album_name","ShortText"],
["artist_name","ShortText"]
],
[3,"曲名3","原盤2","音楽家2"]
]
]
]
# 全文検索(artist_nameかartist_name_kanaかartist_name_enのいずれかに「ち」を含むレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --match_columns artist_name||artist_name_kana||artist_name_en --query 'ち' --output_columns '_key, name, album_name, artist_name'
[
[0,1410362018.65404,0.00106215476989746],
[
[
[2],
[
["_key","UInt32"],
["name","ShortText"],
["album_name","ShortText"],
["artist_name","ShortText"]
],
[1,"曲名1","原盤1","音楽家1"],
[2,"曲名2","原盤1","音楽家1"]
]
]
]
なお、検索には--query
より複雑な条件を指定できる--filter
という引数もありますが、使いやすさを重視して--query
で書いています。
また、上の例では使っていませんが、--sortby
や--offset
、--limit
もあります。
詳しくはこちら。