Elasticsearchのingest機能について

Last updated at 2019-01-21Posted at 2019-01-19

Elasticsearchのingest機能について実際やってみたので、その共有。

ingest機能とは

データを格納する前にpipeline(processor)にdocumentを通すことで、データの整形を行う。

ingestの種類

多くのprocessorがある。ここを参照

set processor
append processor
date processor
lowcase processor

などなど

実験

以下elasticsearch-rails gemを使って実験です。

gem 'elasticsearch-rails', '~> 6'
gem 'elasticsearch-model', '~> 6'

pipelineの作成

ingestを使うにはまず上であげたprocessorを使ってpipelineを作成します。
ここでは以下のようなコードを書いてpipelineを作成します。

Elasticsearchの接続のコードは以下のような感じにしています。多少、省略していますが、大体こんな感じ。

class ElasticsearchClient
  class << self
    def client
      if ENV['RAILS_ENV'] == 'production'
        return connection_to_bonsai
      end

      connection_to_local
    end

    private

    def hosts
      [{ host: ElasticsearchConfig::CONFIG[:host], port: ElasticsearchConfig::CONFIG[:port] }]
    end

    def url
      ENV['BONSAI_URL']
    end


    def connection_to_local
      Elasticsearch::Client.new(
        hosts: hosts,
        randomize_hosts: true,
        request_timeout: 10,
        reload_connections: 500,
        sniffer_timeout: 3,
        reload_on_failure: false,
        log: false
      )
    end
  end
end

pipelineを作成するコードとpipelineの定義の確認

     def create_pipeline!
       client = ElasticsearchClient.client
       client.ingest.put_pipeline(
         id: 'test_pipeline',
         body: {
           processors: [
             { set: { field: "search_test", value: "{{description}} {{tel}}" } }
           ]
         }
       )
     end

     def get_pipeline!
       client = ElasticsearchClient.client
       client.ingest.get_pipeline(
         id: 'test_pipeline'
       )
     end

今回はset processorを使ってます。search_testというfieldを定義し、document登録時にdescriptionというfieldとtelというfieldの値を連結してsearch_testに入れるようにしてます。

では、indexにpipelineの登録するようにします。
pipelineを定義するのはindexのsettingに次の行を追加するだけです

number_of_shards:   5,
number_of_replicas: 1,
default_pipeline: 'test_pipeline', <<<<<<<<< これを追加

あとはindexを作成し、実際にdocumentを登録してみましょう。すると以下のようにpipelineの内容が反映されていることが確認できます。

$ curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'
~~~~~
~~~~~~~~~
     {
        "_index" : "es_index_name_place",
        "_type" : "_doc",
        "_id" : "296",
        "_score" : 1.0,
        "_source" : {
          "description" : "これはテスト。これはレストランです",
          "search_test" : "これはテスト。これはレストランです 0451112222",
          "tel" : "0451112222",
          "id" : 296,
        }

ここで1点注意です。
elasticsearch6.5.０を最初使っていたのですが、それだとindexに作成したpipelineの登録ができませんでした。bugの報告もあったようで。なので、今回は6.5.4を使ってやりました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up