More than 5 years have passed since last update.

Elasticsearch 1.6.0 - Installing on Windows 7

Elasticsearch

Posted at 2015-07-07

概要

Windows7に開発・検証目的用にElasticsearch 1.6.0をインストールし初歩的な設定を行います。
その次にサンプルデータを登録して初歩的な検索方法の確認を行います。

環境

この記事の内容は下記のバージョンで動作確認を行いました。

Windows7 (64bit)
Java 1.8.0_45
[Elasticsearch] (https://www.elastic.co/) 1.6.0

Windowsにはcurlコマンドがありませんので[cURL] (http://curl.haxx.se/)を使用しました。

参考

下記のサイトを参考にさせて頂きました。

Elasticsearch

[Elasticsearch] (https://www.elastic.co/products/elasticsearch)
[Elasticsearch Reference 1.6] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/index.html)

Slide

[ElasticsearchとKibanaではじめる検索＆アナリティクス] (https://speakerdeck.com/johtani/elasticsearchtokibanadehazimerujian-suo-anariteikusu)
[Terms of endearment - the ElasticSearch Query DSL explained] (http://www.slideshare.net/clintongormley/terms-of-endearment-the-elasticsearch-query-dsl-explained)

Blog

[Elasticsearchチュートリアル - 不可視点] (http://code46.hatenablog.com/entry/2014/01/21/115620)
[実践！Elasticsearch - Wantedly Engineer Blog] (http://engineer.wantedly.com/2014/02/25/elasticsearch-at-wantedly-1.html)
[Elasticsearchとkuromojiでちゃんとした日本語全文検索をやるメモ - GMOメディアエンジニアブログ] (http://tech.gmo-media.jp/post/70245090007/elasticsearch-kuromoji-japanese-fulltext-search)
[elasticsearch - DRYな備忘録] (http://otiai10.hatenablog.com/archive/category/elasticsearch)
[勉強会メモ - 第8回elasticsearch勉強会 - よしだのブログ] (http://blog.yoslab.com/entry/2015/02/13/203251)
[All About Analyzers, Part One] (https://www.found.no/foundation/text-analysis-part-1/)

Qiita

[Kibana 4.1.0 + ElasticSearch 1.6.0 でデータビジュアライズ] (http://qiita.com/hiyuzawa/items/bad1a7e29fc8d1820bea)
[Kibana+Elasticsearchで文字列の完全一致と部分一致検索の両方を実現する] (http://qiita.com/harukasan/items/4ec517d8d96f557367e1)
[Elasticsearch CheatSheet] (http://qiita.com/ikawaha/items/228ee3f481e9636b3065)

インストール

Windows7にElasticsearchといくつかpluginをインストールします。

Elasticsearch

ダウンロードページよりアーカイブファイルをダウンロードし適当な場所に展開します。
ダウンロードしたアーカイブファイルはelasticsearch-1.6.0.zipです。

展開

インストールはアーカイブファイルを適当な場所へ展開するだけで済みます。
D:\dev\elasticsearch-1.6.0へ展開しました。

設定

開発・検証用なので最小リソースで起動するように設定します。
設定ファイルは展開したディレクトリのconf/elasticsearch.ymlです。

下記に変更点のみ抜粋します。

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: master

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true

node.name : node名を指定しない場合はElasticsearchインスタンスの起動時に自動的に命名されます。

# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
index.number_of_shards: 1
index.number_of_replicas: 0

開発用なのでシャード数は1で、レプリカは行わないようにします。

# Set this property to true to lock the memory:
#
bootstrap.mlockall: true

# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]

起動

展開したディレクトリへ移動して下記のコマンドを実行します。
オプションで使用するメモリサイズを指定することができます。

> bin/elasticsearch.bat -Xmx256m -Xms256m

動作確認

curlかブラウザで下記のURLへアクセスしstatus 200のレスポンスが返ってくることを確認します。

GET

> curl -XGET "localhost:9200/"

response

{
  "status" : 200,
  "name" : "master",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.6.0",
    "build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
    "build_timestamp" : "2015-06-09T13:36:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

plugin

elasticsearch-head

URL: http://mobz.github.io/elasticsearch-head/

インストール

install

> bin/plugin -install mobz/elasticsearch-head

確認

下記のURLにアクセスしてへheadページが表示されることを確認します。

elasticsearch-analysis-kuromoji

URL: https://github.com/elastic/elasticsearch-analysis-kuromoji

インストール

install

> bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.6.0

elasticsearch-inquisitor

URL: https://github.com/polyfractal/elasticsearch-inquisitor

インストール

install

> bin/plugin -install polyfractal/elasticsearch-inquisitor

確認

下記のURLにアクセスしてInquisitorページが表示されることを確認します。

インストールしたプラグインの確認

> bin\plugin -l

response

Installed plugins:
    - analysis-kuromoji
    - head
    - inquisitor

基本的な検索方法の確認

サンプルデータをインデックスし何通りかの方法で検索をします。

サンプルデータの準備

サンプルデータにはテレビドラマの情報を使用します。

field	data type	description
title	文字列	原題
original_air_date	文字列	放送日
runtime	整数	放送時間(分)
guest_staring	文字列	ゲスト出演
guest_staring_role	文字列	ゲスト役柄
directed_by	文字列	監督
written_by	文字配列	脚本
teleplay	文字配列	テレビ脚本
season	整数	シーズン
no_in_season	整数	シーズン回
no_in_series	整数	放送回
japanese_title	文字列	邦題
japanese_air_date	日付	日本放送日

mapping

index: tvfile
type: columbo

サンプルデータのマッピング

columbo_mapping.json

{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "analysis": {
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "standard"
          ]
        },
        "letter_lower_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "lowercase"
          ]
        },
        "letter_upper_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "uppercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "columbo": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": true
      },
      "_timestamp": {
        "enabled": true
      },
      "dynamic": "strict",
      "properties": {
        "title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "original_air_date": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "runtime": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "guest_staring": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "guest_staring_role": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "directed_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "written_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "teleplay": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_series": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "japanese_title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "kuromoji_analyzer",
          "store": true,
          "include_in_all": true
        },
        "japanese_air_date": {
          "type": "date",
          "format": "dateHourMinuteSecond",
          "store": true,
          "include_in_all": false
        }
      }
    }
  }
}

indexの作成

上記のjsonファイルを使用してindexを作成しmappingを設定します。

POST

> curl -XPOST "localhost:9200/tvfile?pretty" -d @columbo_mapping.json

mappingの確認

GET

> curl -XGET "localhost:9200/tvfile/_settings,_mappings?pretty"

indexを削除する場合

DELETE

> curl -XDELETE "localhost:9200/tvfile"

サンプルデータ

長くなるので一部分のみ掲載します。サンプルデータ全文は[こちらのページ] (http://qiita.com/rubytomato@github/private/700be487ddb7221c29cc)にあります。

columbo_data.json

{"index": {}}
{"title": "Prescription: Murder",                "original_air_date": "February 20, 1968",   "runtime": 98, "guest_staring": "Gene Barry",        "guest_staring_role": "Dr. Ray Fleming (Gene Barry), a psychiatrist",                                              "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": [""],                                                                     "season": 0, "no_in_season": 1, "no_in_series": 1,  "japanese_title": "殺人処方箋",                   "japanese_air_date": "1972-08-27T00:00:00"}
{"index": {}}
{"title": "Ransom for a Dead Man",               "original_air_date": "March 1, 1971",       "runtime": 98, "guest_staring": "Lee Grant",         "guest_staring_role": "Leslie Williams, a brilliant lawyer and pilot",                                             "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": ["Dean Hargrove"],                                                        "season": 0, "no_in_season": 2, "no_in_series": 2,  "japanese_title": "死者の身代金",                 "japanese_air_date": "1973-04-22T00:00:00"}
{"index": {}}
{"title": "Murder by the Book",                  "original_air_date": "September 15, 1971",  "runtime": 73, "guest_staring": "Jack Cassidy",      "guest_staring_role": "Ken Franklin is one half of a mystery writing team",                                        "directed_by": "Steven Spielberg",     "written_by": ["Steven Bochco"],                                    "teleplay": [""],                                                                     "season": 1, "no_in_season": 1, "no_in_series": 3,  "japanese_title": "構想の死角",                   "japanese_air_date": "1972-11-26T00:00:00"}

ドキュメントの登録

上記のjsonファイルを使用してドキュメントをインデックスします。

POST

> curl -XPOST "localhost:9200/tvfile/columbo/_bulk?pretty" --data-binary @columbo_data.json

ドキュメントの全削除する場合

DELETE

> curl -XDELETE "localhost:9200/tvfile/columbo?pretty"

ドキュメント件数をカウント

GET

> curl -XGET "localhost:9200/tvfile/columbo/_count?pretty" -d "{
  \"query\": {
      \"matchAll\": {}
  }
}"

response

{
  "count" : 45,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

ドキュメントの検索

検索はSearch APIsを使用します。検索方法はURI SearchとRequest Body Searchがあります。

Syntax

[host name][:port]/[index name]/[type name]/_search

検索結果に含まれる共通フィールド

field	description
`took`	検索にかかった時間(ミリ秒).
`timed_out`	検索がタイムアウトしたかどうかを真偽値で表現.
`_shards`	検索できたシャード数および検索できなかったシャード数.
`hits`	検索結果を保持
`hits.total`	検索条件に一致するドキュメントの件数.
`hits.hits`	検索結果に一致するドキュメントの配列.（デフォルトは10件)
`_score`	ドキュメントのスコア.
`max_score`	最大スコア.

[The Search API] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/_the_search_api.html)

URI Search

URL Searchはリクエストパラメータに検索条件を指定して検索します。

無条件で検索

検索条件はqパラメータで指定します。

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?q=*&from=0&size=10&pretty"

条件を指定して検索

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?q=September%20Patrick&df=_all&default_operator=OR&from=0&size=10&_source=false&fields=title,original_air_date,runtime,guest_staring,directed_by,written_by,season,no_in_season&sort=season:asc,no_in_season:asc&track_scores=true&pretty"

Parameters

|name |default / description |
|:----------------------------|:--------------------------------------------------------------------------------------------|
|q |The query string. |
|df |The default field to use when no field prefix is defined within the query. |
|analyzer |The analyzer name. |
|lowercase_expanded_terms |Defaults to true. |
|analyze_wildcard |Defaults to false. |
|default_operator |can be AND or OR. Defaults to OR. |
|lenient |Defaults to false. |
|explain |For each hit, contain an explanation of how scoring of the hits was computed. |
|_source |Set to false to disable retrieval of the _source field. |
|fields |The selective stored fields of the document to return for each hit, comma delimited. |
|sort |Sorting to perform. Can either be in the form of fieldName, or fieldName:asc / fieldName:desc. |
|track_scores |When sorting, set to true in order to still track scores and return them as part of each hit.|
|timeout |Defaults to no timeout. |
|terminate_after |The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.|
|from |Defaults to 0. |
|size |Defaults to 10. |
|search_type |Defaults to query_then_fetch. |

qに検索するキーワードを指定します。
dfに検索する対象のフィールド名を指定します。デフォルトは_allです。
_sourceにfalseを指定すると検索結果に_sourceフィールドを含めません。
fieldsに検索結果に含めたいフィールド名をカンマ区切りで指定します。
track_scoresにtrueを指定するとソート時でもスコアを計算します。(デフォルトではソートを行うとスコアを計算しません。)
from、sizeで検索するドキュメントの位置を指定できます。

[Elasticsearch Reference - URI Search] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-uri-request.html)

Request Body Search

Request Body Searchはリクエストボディに検索条件を指定します。
検索の種類にQueriesとFiltersがあります。

これらの違いは

Queriesは全文検索および単語検索ができますが、Filtersは単語検索のみです。
Queriesはスコアを計算しますが、Filtersはスコアを計算しません。
QueriesはFiltersに比べてコストがかかります。
Queriesは検索結果をキャッシュしませんが、Filtersはキャッシュをします。

QueriesとFiltersを組み合わせて使用することもできます。

Queries

Match All Query

queryにmatchAllを指定すると、無条件での検索になります。

columbo_match_all_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}

_sourceにfalseを指定したので検索結果に_sourceフィールドは含まれません。
fromに0,sizeに100を指定したので先頭から100件まで取得します。(sizeのデフォルトは10です。)
fieldsに検索結果に含めたいフィールド名を指定します。
sortでドキュメントの並び順を指定します。

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_query.json

[Elasticsearch Reference - Match All Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-all-query.html)

Match Query

queryにmatchを指定すると、指定したフィールド(この例ではoriginal_air_date)に対してqueryで指定したキーワードを検索します。

columbo_match_query.json

{
  "_source": false,
  "from": 3,
  "size": 3,
  "query": {
    "match": {
      "original_air_date": {
        "query": "September December",
        "operator": "OR"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query.json

response

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : 1.4385337,
    "hits" : [ {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpHw",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Nicholas Colasanto" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "John Cassavetes" ],
        "original_air_date" : [ "September 17, 1972" ],
        "no_in_series" : [ 10 ],
        "runtime" : [ 98 ],
        "season" : [ 2 ],
        "title" : [ "ﾃ液ude in Black" ],
        "written_by" : [ "Richard Levinson & William Link" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpH4",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Jeannot Szwarc" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Vera Miles" ],
        "original_air_date" : [ "September 23, 1973" ],
        "no_in_series" : [ 18 ],
        "runtime" : [ 73 ],
        "season" : [ 3 ],
        "title" : [ "Lovely But Lethal" ],
        "written_by" : [ "Myrna Bercovici" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhHw7_5S8xhhpIA",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Bernard L. Kowalski" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Robert Conrad" ],
        "original_air_date" : [ "September 15, 1974" ],
        "no_in_series" : [ 26 ],
        "runtime" : [ 98 ],
        "season" : [ 4 ],
        "title" : [ "An Exercise in Fatality" ],
        "written_by" : [ "Larry Cohen" ]
      }
    } ]
  }
}

[Elasticsearch Reference - Match Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-query.html)

Multi Match Query

queryにmulti_matchを指定すると、fieldsで指定した複数のフィールドに対してqueryで指定したキーワードを検索します。

columbo_multi_match_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "multi_match": {
      "query": "October Patrick",
      "type": "cross_fields",
      "fields": ["original_air_date", "guest_staring"],
      "operator": "AND"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_multi_match_query.json

typeに指定できる値とその意味は下記の通りです。

Types of multi_match query

|type |description |
|:----------------------|:------------------------------------------------------|
|best_fields |default. Finds documents which match any field, but uses the _score from the best field.|
|most_fields |Finds documents which match any field and combines the _score from each field. |
|cross_fields |Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. |
|phrase |Runs a match_phrase query on each field and combines the _score from each field. |
|phrase_prefix |Runs a match_phrase_prefix query on each field and combines the _score from each field.|

[Elasticsearch Reference - Multi Match Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-multi-match-query.html)

Query String Query

queryにquery_stringを指定すると、他のQueryより複雑な条件指定が可能になります。

columbo_query_string.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "query_string": {
      "fields" : ["_all"],
      "query": "(September OR Patrick) AND (season:5)"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_string.json

default_field

クエリで検索するフィールドを明示的に指定しない場合に参照されるフィールドです。
デフォルトは_allフィールドになります。

別のフィールドを指定する場合は

example

{
  "settings": {
    "index": {
      "query": {
        "default_field": "_all"
      }
    }
  }
}

[Elasticsearch Reference - Query String Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-query-string-query.html)

Simple Query String Query

simple_query_stringはquery_stringの簡易版です。

columbo_simple_query_string.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "simple_query_string" : {
      "query": "(September | October | November) +(McGoohan)",
      "fields": ["_all"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_simple_query_string.json

Available flags

|flag |description |
|:-----------|:-----------|
|ALL | |
|NONE | |
|AND |+ |
|OR || |
|NOT |- |
|PREFIX |* |
|PHRASE |" |
|PRECEDENCE|( and ) |
|ESCAPE | |
|WHITESPACE| |
|FUZZY |~N after a word |
|NEAR | |
|SLOP |~N after a phrase|

[Elasticsearch Reference - Simple Query String Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-simple-query-string-query.html)

Term Query

queryにtermを指定すると、termで指定するフィールドの値と完全に一致するドキュメントを検索します。

columbo_term_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_query.json

[Elasticsearch Reference - Term Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-term-query.html)

Bool Query

queryにboolを指定すると、複数のqueryを組み合わせて検索することができます。

columbo_bool_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "bool": {
      "must": {
        "match": {
          "_all": {
            "query": "September Patrick",
            "operator": "OR"
          }
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_query.json

The occurrence types

|occur |description |
|:-------------|:-------------------------------------------------------------------|
|must |The clause (query) must appear in matching documents. |
|should |The clause (query) should appear in the matching document. |
|must_not |The clause (query) must not appear in the matching documents. |

shouldを指定をした場合、minimum_should_matchパラメータで最小のマッチ数を指定できます。

[Elasticsearch Reference - Bool Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-bool-query.html)

Range Query

queryにrangeを指定すると、rangeで指定するフィールドの値で範囲検索することができます。

columbo_range_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_query.json

[Elasticsearch Reference - Range Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-range-query.html)

Ids Query

queryにidsを指定すると、_idフィールドの値で検索することができます。

columbo_ids_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_query.json

[Elasticsearch Reference - Ids Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-ids-query.html)

Filters

Match All Filter

columbo_match_all_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_filter.json

[Elasticsearch Reference - Match All Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-all-filter.html)

Query Filter

columbo_query_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "query": {
      "query_string" : {
        "fields" : ["_all"],
        "query": "(September OR Patrick) AND (season:5)"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_filter.json

[Elasticsearch Reference - Query Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-query-filter.html)

Term Filter

columbo_term_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_filter.json

[Elasticsearch Reference - Term Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-term-filter.html)

Bool Filter

columbo_bool_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "bool": {
      "must": {
        "term": {
          "no_in_season": {"value": 1}
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_filter.json

[Elasticsearch Reference - Bool Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-bool-filter.html)

Range Filter

columbo_range_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_filter.json

[Elasticsearch Reference - Range Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-range-filter.html)

Ids Filter

columbo_ids_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ]
}

_idの値を自動生成する場合、ドキュメントを登録するたびに_id値が変わりますので上記のjsonをそのまま使用して検索しても結果は得られません。

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_filter.json

[Elasticsearch Reference - Ids Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-ids-filter.html)

QueryとFilterの組み合わせ

columbo_match_query_range_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "match": {
      "_all": {
        "query": "September Patrick",
        "operator": "OR"
      }
    }
  },
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 30, "lte": 39}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query_range_filter.json

クエリパラメータ

query parameters

|parameter |default / description |
|:---------------------|:--------------------------------------------------------|
|timeout |Defaults to no timeout. |
|from |Defaults to 0. |
|size |Defaults to 10. |
|search_type |Defaults to query_then_fetch. |
|query_cache |Set to true or false to enable or disable the caching of search results.|
|terminate_after |Defaults to no terminate_after. [experimental] |

timeoutはタイムアウトする時間を文字列で指定します。指定できる単位は下記のTime unitsにある通りです。
search_typeとquery_cacheはクエリパラメータで指定します。

Time units

unit	description
`y`	Year
`M`	Month
`w`	Week
`d`	Day
`h`	Hour
`m`	Minute
`s`	Second

[Elasticsearch Reference - Request Body Search] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-request-body.html)

Elasticsearchの仕様メモ

mapping

Fields

ドキュメントのマッピングで使用できるフィールド

|field |default / description |
|:--------------|:------------------------------------------------------------------------------------------|
|_uid |Each document indexed is associated with an id and a type, the internal _uid field is the unique identifier of a document within an index and is composed of the type and the id.|
|_id |By default it is not indexed and not stored (thus, not created). |
|_type |By default, the _type field is indexed (but not analyzed) and not stored. |
|_source |The _source field is an automatically generated field that stores the actual JSON that was used as the indexed document. |
|_all |The idea of the _all field is that it includes the text of one or more other fields within the document indexed. |
|_analyzer |Deprecated in 1.5.0. |
|_boost |Deprecated in 1.0.0.RC1. |
|_parent |The parent field mapping is defined on a child mapping, and points to the parent type this child relates to.|
|_field_names |The _field_names field indexes the field names of a document, which can later be used to search for documents based on the fields that they contain typically using the exists and missing filters.|
|_routing |The routing field allows to control the _routing aspect when indexing data and explicit routing control is required.|
|_index |By default it is disabled.|
|_size |By default it is disabled.|
|_timestamp |By default it is disabled.|
|_ttl |By default it is disabled.|

[Elasticsearch Reference - Fields] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-fields.html)

Types

ドキュメントのマッピングで使用できるデータタイプ

[Elasticsearch Reference - Types] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-types.html)

Core Types

string

attributes

|attribute |default / description |
|:----------------------------|:----------------------------------------------------------------------------------------|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|index |Defaults to analyzed. not_analyzed, no |
|doc_values |Set to true to store field values in a column-stride fashion. |
|term_vector |Defaults to no. |
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |
|norms: {enabled: <value>} |Defaults to true for analyzed fields, and to false for not_analyzed fields. |
|norms: {loading: <value>} |possible values are eager and lazy (default). |
|index_options |Defaults to positions for analyzed fields, and to docs for not_analyzed fields. |
|analyzer |Defaults to the globally configured analyzer. |
|index_analyzer |The analyzer used to analyze the text contents when analyzed during indexing. |
|search_analyzer |The analyzer used to analyze the field when part of a query string. |
|include_in_all |If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting. |
|ignore_above |The analyzer will ignore strings larger than this size. |
|position_offset_gap |Defaults to 0. |

copy_to

copy_toを使用すると別のフィールドへ値をコピーすることができます。

example

{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "copy_to": "contents"
    },
    "contents": {
      "type": "string"
    }
  }
}

fields

multi_fieldタイプはversion 1.0で[Core Typesから削除] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/_multi_fields.html)されました。
fieldsを使用することで1つのJSONソースフィールドを複数のフィールドへマップすることができます。

example

{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

Number

数値型にはfloat,double,byte,short,integer,longがあります。

attributes

|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|type |float, double, integer, long, short, byte. Required.|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|index |Set to no if the value should not be indexed. Setting to no disables include_in_all. |
|doc_values |Set to true to store field values in a column-stride fashion. |
|precision_step |Defaults to 16 for long, double, 8 for short, integer, float, 2147483647 for byte.|
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |
|include_in_all |If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting. |
|ignore_malformed |Defaults to false. |
|coerce |Defaults to true. |

Date

attributes

|attribute |description |
|:----------------------------|:-----------------------------------------------|
|index_name |Defaults to the property/field name. |
|format |Defaults to dateOptionalTime. |
|store |Defaults to false. |
|index |Set to no if the value should not be indexed. Setting to no disables include_in_all. |
|doc_values |Set to true to store field values in a column-stride fashion. |
|precision_step |Defaults to 16. |
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |
|include_in_all |If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.|
|ignore_malformed |Defaults to false. |
|numeric_resolution |Possible values include seconds and milliseconds (default). |

Boolean

attributes

|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|index |Set to no if the value should not be indexed. Setting to no disables include_in_all. |
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |

Binary

attributes

|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|doc_values |Set to true to store field values in a column-stride fashion. |
|compress |Set to true to compress the stored binary value. |
|compress_threshold |Defaults to -1 |

Root Object Type

Root Object Type

|type |default / description |
|:-----------------------|:-------------------------------------------------------------|
|dynamic_date_formats |dynamic_date_formats is the ability to set one or more date formats that will be used to detect date fields. |
|date_detection |Allows to disable automatic date type detection. |
|numeric_detection |Sometimes, even though json has support for native numeric types, numeric values are still provided as strings. |
|dynamic_templates |Dynamic templates allow to define mapping templates that will be applied when dynamic introduction of fields / objects happens. |

[Elasticsearch Reference - Root Object Type] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-root-object-type.html)

Date Format

Built in Formatsの抜粋

|format |pattern |expected |
|:--------------------------------|:-----------------------------|:------------------------------|
|basic_date |yyyyMMdd |20060102 |
|basic_date_time |yyyyMMdd'T'HHmmss.SSSZ |20060102T150405.999+0900 |
|basic_date_time_no_millis |yyyyMMdd'T'HHmmssZ |20060102T150405+0900 |
| | | |
|date |yyyy-MM-dd |2006-01-02 |
|date_time |yyyy-MM-dd'T'HH:mm:ss.SSSZZ |2006-01-02T15:04:05.999+09:00 |
|date_time_no_mills |yyyy-MM-dd'T'HH:mm:ssZZ |2006-01-02T15:04:05+09:00 |
|date_optional_time |yyyy-MM-dd |2006-01-02 |
|date_optional_time |yyyy-MM-dd'T'HH:mm:ss |2006-01-02T15:04:05 |
| | | |
|date_hour_minute_second |yyyy-MM-dd'T'HH:mm:ss |2006-01-02T15:04:05 |
|date_hour_minute_second_millis |yyyy-MM-dd'T'HH:mm:ss.SSS |2006-01-02T15:04:05.999 |

[Elasticsearch Reference - mapping-data-formt] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-date-format.html)

Analysis

Analyzerは1個のTokenizerと0個以上のToken Filterの組み合わせです。

example

{
  "settings": {
    "analysis": {
      "analyzer": {
        "{analyzer論理名}": {
          "type": "使用するanalyzerの指定",
          "使用するanalyzer固有の設定"
        },
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform",
            "kuromoji_pos_filter"
          ]
        },
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "my_filter"
          ],
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "tokenizer": {
        "{tokenizer論理名}": {
          "type": "使用するtokenizerの指定",
          "使用するtokenizer固有の設定"
        },
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "my_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "filter": {
        "{filter論理名}": {
          "type": "使用するfilterの指定",
          "使用するfilter固有の設定"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        },
        "my_filter": {
          "type": "stop",
          "stopwords": ["NGWORD_A", "NGWORD_B", "NGWORD_C"]
        }
      },
      "char_filter": {
        "{char_filter論理名}": {
          "type": "使用するchar_filterの指定",
          "使用するchar_filter固有の設定"
        },
        "my_char_filter": {
          "type": "mapping",
          "mappings" : ["kb=>kilobyte","mb=>megabyte","gb=>gigabyte"]
        }
      }
    },
    "index": {
      "indexの設定"
    }
  },
  "mappings": {
    "{type名}": {
      "typeの設定"
    },
    "{type名}": {
      "typeの設定"
    }
  }
}

Analyzers

Built in Analyzers

Analyzers	type	description
[Standard Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-standard-analyzer.html)	`standard`	`Standard Tokenizer`,`the Standard Token Filter`,`Lower Case Token Filter`,`Stop Token Filter`から構成されるanalyzer.
[Simple Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-simple-analyzer.html)	`simple`	`Lower Case Tokenizer`から構成されるanalyzer.
[Whitespace Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-whitespace-analyzer.html)	`whitespace`	`Whitespace Tokenizer`から構成されるanalyzer.
[Stop Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stop-analyzer.html)	`stop`	`Lower Case Tokenizer`と `Stop Token Filter`から構成されるanalyzer.
[Keyword Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-analyzer.html)	`keyword`	与えられた文字全体を1つのトークンとして処理するanalyzer.
[Pattern Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-analyzer.html)	`pattern`	正規表現を使用するanalyzer.
[Language Analyzers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-lang-analyzer.html)	下表参照	特定言語用のanalyzer.
[Snowball Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-snowball-analyzer.html)	`snowball`	`standard tokenizer`, `standard filter`, `lowercase filter`, `stop filter`, `snowball filter`から構成されるanalyzer.
[Custom Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-custom-analyzer.html)	`custom`	任意のTokenizer, 0個以上の任意のToken Filters, 0個以上の任意のChar Filtersを組み合わせて構成するanalyzer.

The following types are supported

type	language
`arabic`	アラビア語
`armenian`	アルメニア語
`basque`	バスク語
`brazilian`	ポルトガル語（ブラジル）
`bulgarian`	ブルガリア語
`catalan`	カタロニア語
`chinese`	中国語
`cjk`	CJK統合漢字
`czech`	チェコ語
`danish`	デンマーク語
`dutch`	オランダ語
`english`	英語
`finnish`	フィンランド語
`french`	フランス語
`galician`	ガリシア語
`german`	ドイツ語
`greek`	ギリシャ語
`hindi`	ヒンディー語
`hungarian`	ハンガリー語
`indonesian`	インドネシア語
`irish`	アイルランド語
`italian`	イタリア語
`latvian`	ラトビア語
`norwegian`	ノルウェー語
`persian`	ペルシャ語
`portuguese`	ポルトガル語
`romanian`	ルーマニア語
`russian`	ロシア語
`sorani`	クルド語(?)
`spanish`	スペイン語
`swedish`	スウェーデン語
`turkish`	トルコ語
`thai`	タイ語

Custom Analyzerの設定サンプル

kuromojiの設定を例にしたCustom Analyzerの設定サンプルです。

example

{
  "settings": {
    "analysis": {
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        }
      },
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        }
      }
    }
  }
}

kuromoji_tokenizerは、kuromojiのbuilt in tokenizerです。
kuromoji_baseform、kuromoji_part_of_speechは、kuromojiのbuilt in token filterです。

Setting	Description
`tokenizer`	使用するtokenizerの名前.
`filter`	オプション. 使用するtoken filterの名前のリスト.
`char_filter`	オプション. 使用するchar filterの名前のリスト.
`position_offset_gap`	An optional number of positions to increment between each field value of a field using this analyzer.

[Elasticsearch Reference - Analyzers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-analyzers.html)

Tokenizers

Built in Tokenizers

Tokenizer	type	description
[Standard Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-standard-tokenizer.html)	`standard`	European language向けのtokenizer.
[Edge NGram Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-edgengram-tokenizer.html)	`edgeNGram`	nGramを使ってテキストをトークンに分割するtokenizer.
[Keyword Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-tokenizer.html)	`keyword`	テキストを単一のトークンとして扱うtokenizer.
[Letter Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-letter-tokenizer.html)	`letter`	non-lettersでテキストをトークンに分割するtokenizer.
[Lowercase Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-lowercase-tokenizer.html)	`lowercase`	`Letter Tokenizer`と`Lower Case Token Filter`を一緒に使用するのと同じ機能
[NGram Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-ngram-tokenizer.html)	`nGram`	nGramを使ってテキストをトークンに分割するtokenizer.
[Whitespace Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-whitespace-tokenizer.html)	`whitespace`	半角スペースでテキストをトークンに分割するtokenizer.
[Pattern Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-tokenizer.html)	`pattern`	正規表現でテキストをトークンに分割するtokenizer.
[UAX Email URL Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html)	`uax_url_email`	URLやメールアドレスをトークンに分割するtokenizer.
[Path Hierarchy Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pathhierarchy-tokenizer.html)	`path_hierarchy`	パスの構造をトークンにするtokenizer.(パス区切り文字で分割するわけでありません)
[Classic Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-classic-tokenizer.html)	`classic`	英文向けのtokenizer.
[Thai Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-thai-tokenizer.html)	`thai`	タイ語向け?のtokenizer.

edgeNGramとnGram

elasticというテキストをnGramとedgeNGramでトークンにした場合の結果

min_gram: 2
max_gram: 3
token_chars: letter,digit

nGram

position	type	token
1	word	`el`
2	word	`ela`
3	word	`la`
4	word	`las`
5	word	`as`
6	word	`ast`
7	word	`st`
8	word	`sti`
9	word	`ti`
10	word	`tic`
11	word	`ic`

edgeNGram

position	type	token
1	word	`el`
2	word	`ela`

uax_url_email

Elasticsearch reference <a href =\"https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html\">UAX Email URL Tokenizer</a>というテキストをトークンにした場合の結果

position	type	token
1	ALPHANUM	`Elasticsearch`
2	ALPHANUM	`reference`
3	ALPHANUM	`a`
4	ALPHANUM	`href`
5	URL	`https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html`
6	ALPHANUM	`UAX`
7	ALPHANUM	`Email`
8	ALPHANUM	`URL`
9	ALPHANUM	`Tokenizer`
10	ALPHANUM	`a`

path_hierarchy

C:/Windows/System32/drivers/etcというテキストをトークンにした場合の結果

position	type	token
1	word	`C:`
1	word	`C:/Windows`
1	word	`C:/Windows/System32`
1	word	`C:/Windows/System32/drivers`
1	word	`C:/Windows/System32/drivers/etc`

[Elasticsearch Reference - Tokenizers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-tokenizers.html)

Token Filters

Token Filter	type
[Standard Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-standard-tokenfilter.html)	`standard`
[ASCII Folding Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-asciifolding-tokenfilter.html)	`asciifolding`
[Length Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-length-tokenfilter.html)	`length`
[Lowercase Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-lowercase-tokenfilter.html)	`lowercase`
[Uppercase Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uppercase-tokenfilter.html)	`uppercase`
[NGram Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-ngram-tokenfilter.html)	`nGram`
[Edge NGram Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-edgengram-tokenfilter.html)	`edgeNGram`
[Porter Stem Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-porterstem-tokenfilter.html)	`porter_stem`
[Shingle Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-shingle-tokenfilter.html)	`shingle`
[Stop Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stop-tokenfilter.html)	`stop`
[Word Delimiter Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-word-delimiter-tokenfilter.html)	`word_delimiter`
[Stemmer Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stemmer-tokenfilter.html)	`stemmer`
[Stemmer Override Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stemmer-override-tokenfilter.html)	`stemmer_override`
[Keyword Marker Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-marker-tokenfilter.html)	`keyword_marker`
[Keyword Repeat Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-repeat-tokenfilter.html)	`keyword_repeat`
[KStem Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-kstem-tokenfilter.html)	`kstem`
[Snowball Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-snowball-tokenfilter.html)	`snowball`
[Phonetic Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-phonetic-tokenfilter.html)	`phonetic`
[Synonym Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-synonym-tokenfilter.html)	`synonym`
[Compound Word Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-compound-word-tokenfilter.html)	`dictionary_decompounder`, `hyphenation_decompounder`
[Reverse Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-reverse-tokenfilter.html)	`reverse`
[Elision Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-elision-tokenfilter.html)	`elision`
[Truncate Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-truncate-tokenfilter.html)	`truncate`
[Unique Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-unique-tokenfilter.html)	`unique`
[Pattern Capture Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-capture-tokenfilter.html)	`pattern_capture`
[Pattern Replace Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern_replace-tokenfilter.html)	`pattern_replace`
[Trim Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-trim-tokenfilter.html)	`trim`
[Limit Token Count Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-limit-token-count-tokenfilter.html)	`limit`
[Hunspell Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-hunspell-tokenfilter.html)	`hunspell`
[Common Grams Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-common-grams-tokenfilter.html)	`common_grams`
[Normalization Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-normalization-tokenfilter.html)	下表参照
[CJK Width Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-cjk-width-tokenfilter.html)	`cjk_width`
[CJK Bigram Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-cjk-bigram-tokenfilter.html)	`cjk_bigram`
[Delimited Payload Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-delimited-payload-tokenfilter.html)	`delimited_payload_filter`
[Keep Words Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keep-words-tokenfilter.html)	`keep`
[Keep Types Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keep-types-tokenfilter.html)	`keep_types`
[Classic Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-classic-tokenfilter.html)	`classic`
[Apostrophe Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-apostrophe-tokenfilter.html)	`apostrophe`

Normalization Token Filter

language	type
Arabic	`arabic_normalization`
German	`german_normalization`
Hindi	`hindi_normalization`
Indic	`indic_normalization`
Kurdish (Sorani)	`sorani_normalization`
Persian	`persian_normalization`
Scandinavian	`scandinavian_normalization`, `scandinavian_folding`

[Elasticsearch Reference - Token Filters] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-tokenfilters.html)

Character Filters

Character Filter	type
[Mapping Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-mapping-charfilter.html)	`mapping`
[Mapping Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-mapping-charfilter.html)	`html_strip`
[Pattern Replace Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-replace-charfilter.html)	`pattern_replace`

[Elasticsearch Reference - Character Filters] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-charfilters.html)

Analyze

Indices APIsの_analyzeを使用するとanalyzerの結果を確認することができます。Elasticsearchのbuilt in analyzerであればインデックスを指定する必要がありません。

Syntax

> curl -XGET "[host name][:port]/[index name]/_analyze?analyzer={analyzer name}&tokenizer={tokenizer name}&token_filters={}&char_filters={}" -d "The Bye-Bye Sky High IQ Murder Case"

standard

example

> curl -XGET "localhost:9200/_analyze?analyzer=standard" -d "The Bye-Bye Sky High IQ Murder Case"

analyzerの挙動を詳しく確認したい場合は、indexに設定した方がよいようです。
下記はtestというindexに確認したanalyzerを設定し、そのanalyzerを使用する例です。

PUT

curl -XPUT "localhost:9200/test?pretty" -d "{
  \"settings\": {
    \"analysis\": {
      \"analyzer\": {
        "my_analyzer": {
          \"type\": \"custom\",
          \"tokenizer\": \"my_tokenizer\"
        }
      },
      \"tokenizer\": {
        \"my_tokenizer\": {
          \"type\": \"path_hierarchy\",
          \"reverse\": false,
          \"skip\": 0
        }
      }
    }
  }
}"

GET

curl -XGET "localhost:9200/test/_analyze?analyzer=my_analyzer&pretty" -d "C:/Windows/System32/drivers/etc"

[Elasticsearch Reference - Analyze] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-analyze.html)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up