Qiita Teams that are logged in
You are not logged in to any team

Log in to Qiita Team
Community
OrganizationAdvent CalendarQiitadon (β)
Service
Qiita JobsQiita ZineQiita Blog
Help us understand the problem. What is going on with this article?

Groongaでの全文検索コマンドの使い方(groongaコマンド編)

More than 5 years have passed since last update.

Groongaでの全文検索コマンドの使い方(groongaコマンド編)

前回の記事で構築したGroongaで全文検索を行うコマンドを試してみます。

このドキュメントは、公式のチュートリアルを参考にしています。

また、HTTPサーバー経由での全文検索は別記事を参照してください。

準備

全文検索に必要なテーブルを準備します。

まずはデータを格納するためのテーブルを作成します。

テーブルの作成
# test.grnにログイン
$ groonga /tmp/test.grn

# MySQL互換のノーマライザを使えるようにする
> register normalizers/mysql

# とりあえずtrack_id(UInt32)をKeyとしたテーブルを作る
> table_create SearchTrack TABLE_PAT_KEY UInt32 --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI

# テーブルが作られた事を確認
> table_list
[
  [0,1410341854.51164,9.44137573242188e-05],
  [
    #中略
    [257,"SearchTrack","/tmp/test.grn.0000101","TABLE_PAT_KEY|PERSISTENT","UInt32",null,"TokenBigram","NormalizerMySQLGeneralCI"]
  ]
]

# カラム名を追加する
> column_create SearchTrack name --type ShortText
> column_create SearchTrack name_kana --type ShortText
> column_create SearchTrack name_en --type ShortText
> column_create SearchTrack artist_name --type ShortText
> column_create SearchTrack artist_name_kana --type ShortText
> column_create SearchTrack artist_name_en --type ShortText
> column_create SearchTrack album_name --type ShortText
> column_create SearchTrack album_name_kana --type ShortText
> column_create SearchTrack album_name_en --type ShortText

# 作ったテーブルを確認
> column_list SearchTrack
[
  [0,1410342025.90874,0.000189781188964844],
  [
    #中略
    [257,"_key","","","COLUMN_SCALAR","SearchTrack","UInt32",[]],
    [266,"album_name","/tmp/test.grn.000010A","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [268,"album_name_en","/tmp/test.grn.000010C","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [267,"album_name_kana","/tmp/test.grn.000010B","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [263,"artist_name","/tmp/test.grn.0000107","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [265,"artist_name_en","/tmp/test.grn.0000109","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [264,"artist_name_kana","/tmp/test.grn.0000108","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [261,"name","/tmp/test.grn.0000105","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [262,"name_en","/tmp/test.grn.0000106","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]],
    [256,"name_kana","/tmp/test.grn.0000100","var","COLUMN_SCALAR|PERSISTENT","SearchTrack","ShortText",[]]
  ]
]

続いて、全文検索に用いる語彙表と転置インデックスを作成します。
なお、語彙表と転置インデックスについては、こちらを参照してください。

全文検索の準備
# 全文検索用の語彙表の作成
> table_create SearchTrackTerms TABLE_PAT_KEY ShortText --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI

# 全文検索用のインデックスを各カラムの数だけ作成する
> column_create SearchTrackTerms name --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source name
> column_create SearchTrackTerms name_kana --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source name_kana
> column_create SearchTrackTerms name_en --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source name_en
> column_create SearchTrackTerms artist_name --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source artist_name
> column_create SearchTrackTerms artist_name_kana --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source artist_name_kana
> column_create SearchTrackTerms artist_name_en --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source artist_name_en
> column_create SearchTrackTerms album_name --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source album_name
> column_create SearchTrackTerms album_name_kana --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source album_name_kana
> column_create SearchTrackTerms album_name_en --flags COLUMN_INDEX|WITH_POSITION --type SearchTrack --source album_name_en

# 作った語彙表を確認
> column_list SearchTrackTerms
[
  [0,1410343882.38171,0.000171184539794922],
  [
    [
      # 中略
    ],
    [271,"_key","","","COLUMN_SCALAR","SearchTrackTerms","ShortText",[]],
    [278,"album_name","/tmp/test.grn.0000116","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.album_name"]],
    [280,"album_name_en","/tmp/test.grn.0000118","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.album_name_en"]],
    [279,"album_name_kana","/tmp/test.grn.0000117","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.album_name_kana"]],
    [275,"artist_name","/tmp/test.grn.0000113","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.artist_name"]],
    [277,"artist_name_en","/tmp/test.grn.0000115","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.artist_name_en"]],
    [276,"artist_name_kana","/tmp/test.grn.0000114","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.artist_name_kana"]],
    [272,"name","/tmp/test.grn.0000110","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.name"]],
    [274,"name_en","/tmp/test.grn.0000112","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.name_en"]],
    [273,"name_kana","/tmp/test.grn.0000111","index","COLUMN_INDEX|WITH_POSITION|PERSISTENT","SearchTrackTerms","SearchTrack",["SearchTrack.name_kana"]]
  ]
]

余談ですが、チュートリアルでは、table_createの際にKEY_NORMALIZEを指定していますが、リファレンスによると非推奨との事なので、上記のように--nomalizerで指定した方がいいと思います。

最後にテストデータを投入します。

データの投入
> load  --table SearchTrack
[
  {
    "_key": 1, 
    "name": "曲名1",
    "name_kana": "きょくめいいち",
    "name_en": "kyokumei ichi",
    "artist_name": "音楽家1",
    "artist_name_kana": "おんがくかいち",
    "artist_name_en": "ongakuka ichi",
    "album_name": "原盤1",
    "album_name_kana": "げんばんいち",
    "album_name_en": "genban ichi"
  },
  {
    "_key": 2, 
    "name": "曲名2",
    "name_kana": "きょくめいに",
    "name_en": "kyokumei ni",
    "artist_name": "音楽家1",
    "artist_name_kana": "おんがくかいち",
    "artist_name_en": "ongakuka ichi",
    "album_name": "原盤1",
    "album_name_kana": "げんばんいち",
    "album_name_en": "genban ichi"
  },
  {
    "_key": 3, 
    "name": "曲名3",
    "name_kana": "きょくめいさん",
    "name_en": "kyokumei san",
    "artist_name": "音楽家2",
    "artist_name_kana": "おんがくかに",
    "artist_name_en": "ongakuka ni",
    "album_name": "原盤2",
    "album_name_kana": "げんばんに",
    "album_name_en": "genban ni"
  }
]

ちなみに、同じ_keyを持つデータを入れようとするとUPDATE扱いになります。

なお、このコマンドのように1件ずつデータを投入していくのは大変なので、一気に投入できる方法を別途検討中です。

コマンドで実行する

HTTPサーバを介さずに、groongaコマンドで実施するやり方。

groongaコマンドでの実行
# 通常の検索(artist_nameが「音楽家1」と一致するレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --query 'artist_name:"音楽家1"' --output_columns '_key, name, album_name, artist_name'
[
  [0,1410360576.77508,0.00126838684082031],
  [
    [
      [2],   # 結果件数
      [
        ["_key","UInt32"],
        ["name","ShortText"],
        ["album_name","ShortText"],
        ["artist_name","ShortText"]
      ],
      [1,"曲名1","原盤1","音楽家1"],
      [2,"曲名2","原盤1","音楽家1"]
    ]
  ]
]

# 全文検索(artist_nameが「家1」を含むレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --match_columns artist_name --query '家1' --output_columns '_key, name, album_name, artist_name'
[
  [0,1410361014.40699,0.000328302383422852],
  [
    [
      [2],
      [
        ["_key","UInt32"],
        ["name","ShortText"],
        ["album_name","ShortText"],
        ["artist_name","ShortText"]
      ],
      [1,"曲名1","原盤1","音楽家1"],
      [2,"曲名2","原盤1","音楽家1"]
    ]
  ]
]

# 全文検索(album_nameが「原」と「2」を含むレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --match_columns album_name --query '原盤 2' --output_columns '_key, name, album_name, artist_name'
[
  [0,1410361336.0281,0.000438690185546875],
  [
    [
      [1],
      [
        ["_key","UInt32"],
        ["name","ShortText"],
        ["album_name","ShortText"],
        ["artist_name","ShortText"]
      ],
      [3,"曲名3","原盤2","音楽家2"]
    ]
  ]
]

# 全文検索(artist_nameかartist_name_kanaかartist_name_enのいずれかに「ち」を含むレコードの、_key, name, album_name, artist_nameを出力
> select SearchTrack --match_columns artist_name||artist_name_kana||artist_name_en --query 'ち' --output_columns '_key, name, album_name, artist_name'
[
  [0,1410362018.65404,0.00106215476989746],
  [
    [
      [2],
      [
        ["_key","UInt32"],
        ["name","ShortText"],
        ["album_name","ShortText"],
        ["artist_name","ShortText"]
      ],
      [1,"曲名1","原盤1","音楽家1"],
      [2,"曲名2","原盤1","音楽家1"]
    ]
  ]
]

なお、検索には--queryより複雑な条件を指定できる--filterという引数もありますが、使いやすさを重視して--queryで書いています。
また、上の例では使っていませんが、--sortby--offset--limitもあります。
詳しくはこちら

tamano
勉強会のメモや、試行錯誤した履歴を実験ノート的に書きます。 というつもりだったんですが、実際は会社で若手の子に教える内容のメモになっていたり。
https://github.com/tamano/
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away