0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

pgroongaで文字列がどのように分解されるか確認する

Posted at

概要

pgroongaを使っていて、MeCabの辞書が正しく動作しているのか含めTokenizerでどのように文字列が捌かれているか確認したかった。

やり方

pgroonga_tokenize 関数をSELECTする。リファレンス にあるように、関数名はtokenizeだが、ノーマライザーの挙動も確認できる。
(以下スクリーンショットではSQL用コンソールとしてDataGripを使用している)

neologd辞書を読み込んだpgroongaでMeCabの確認をする場合:

select pgroonga_tokenize('特急はくたか車内ではく', 'tokenizer', 'TokenMecab')

結果:

{"{\"value\":\"特急\",\"position\":0,\"force_prefix_search\":false}","{\"value\":\"はくたか\",\"position\":1,\"force_prefix_search\":true}","{\"value\":\"車内\",\"position\":2,\"force_prefix_search\":true}","{\"value\":\"\",\"position\":3,\"force_prefix_search\":true}","{\"value\":\"はく\",\"position\":4,\"force_prefix_search\":true}"}

image.png

TokenBigramだと次のようになる:

select pgroonga_tokenize('特急はくたか車内ではく', 'tokenizer', 'TokenBigram')

バイ-グラムのときの結果:

{"{\"value\":\"特急\",\"position\":0,\"force_prefix_search\":false}","{\"value\":\"急は\",\"position\":1,\"force_prefix_search\":true}","{\"value\":\"はく\",\"position\":2,\"force_prefix_search\":true}","{\"value\":\"くた\",\"position\":3,\"force_prefix_search\":true}","{\"value\":\"たか\",\"position\":4,\"force_prefix_search\":true}","{\"value\":\"か車\",\"position\":5,\"force_prefix_search\":true}","{\"value\":\"車内\",\"position\":6,\"force_prefix_search\":true}","{\"value\":\"内で\",\"position\":7,\"force_prefix_search\":true}","{\"value\":\"では\",\"position\":8,\"force_prefix_search\":true}","{\"value\":\"はく\",\"position\":9,\"force_prefix_search\":true}","{\"value\":\"\",\"position\":10,\"force_prefix_search\":true}"}
0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?