LoginSignup
6
9

More than 5 years have passed since last update.

elasticsearch + python で全文検索したい

Last updated at Posted at 2015-04-28

1. download & install

bin/elasticsearch

elasticsearchだとマストなプラグイン

bin/plugin install mobz/elasticsearch-head

形態素解析してくれるプラグイン

bin/plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.5.0

2. setting kuromoji

面倒くさいなら (restart elasticsearch)

config/elasticsearch.yml
index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer

インデックス単位で設定するなら

curl -XPUT http://localhost:9200/index1/ -d '
{
  "index": {
    "analysis": {
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        }
      },
      "analyzer": {
        "analyzer": {
          "type": "custom",
          "tokenizer":"kuromoji"
        }
      }
    }
  }
}'

3. check

analyzerの確認

curl -XPOST http://localhost:9200/index1/_analyze?analyzer=analyzer&petty -d 'これはペンです'

{
  "tokens": [
    {
      "token": "これ",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "は",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "ペン",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 3
    },
    {
      "token": "です",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 4
    }
  ]
}

うむ。ちゃんと動いてるっぽい。

サンプル登録1

curl -XPUT http://localhost:9200/index1/type1/1 -d '{"text":"これはパンです"}'

サンプル登録2

curl -XPUT http://localhost:9200/index1/type1/2 -d '{"text":"これはペンです"}'

検索!

curl -XGET http://localhost:9200/index1/type1/_search -d '{"query": {"match": {"text": "ペン"}}}'
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_index": "index1",
        "_type": "type1",
        "_id": "2",
        "_score": 0.15342641,
        "_source": {
          "text": "これはペンです"
        }
      }
    ]
  }
}

4. python client

$ pip install elasticsearch
6
9
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
9