概要

Windows7に開発・検証目的用にElasticsearch 1.6.0をインストールし初歩的な設定を行います。
その次にサンプルデータを登録して初歩的な検索方法の確認を行います。

環境

この記事の内容は下記のバージョンで動作確認を行いました。

Windows7 (64bit)
Java 1.8.0_45
Elasticsearch 1.6.0

WindowsにはcurlコマンドがありませんのでcURLを使用しました。

参考

下記のサイトを参考にさせて頂きました。

Elasticsearch

Slide

Blog

Qiita

インストール

Windows7にElasticsearchといくつかpluginをインストールします。

Elasticsearch

ダウンロードページよりアーカイブファイルをダウンロードし適当な場所に展開します。
ダウンロードしたアーカイブファイルはelasticsearch-1.6.0.zipです。

展開

インストールはアーカイブファイルを適当な場所へ展開するだけで済みます。
D:\dev\elasticsearch-1.6.0へ展開しました。

設定

開発・検証用なので最小リソースで起動するように設定します。
設定ファイルは展開したディレクトリのconf/elasticsearch.ymlです。

下記に変更点のみ抜粋します。

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: master

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true

node.name : node名を指定しない場合はElasticsearchインスタンスの起動時に自動的に命名されます。

# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
index.number_of_shards: 1
index.number_of_replicas: 0

開発用なのでシャード数は1で、レプリカは行わないようにします。

# Set this property to true to lock the memory:
#
bootstrap.mlockall: true

# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]

起動

展開したディレクトリへ移動して下記のコマンドを実行します。
オプションで使用するメモリサイズを指定することができます。

> bin/elasticsearch.bat -Xmx256m -Xms256m

動作確認

curlかブラウザで下記のURLへアクセスしstatus 200のレスポンスが返ってくることを確認します。

GET

> curl -XGET "localhost:9200/"

response

{
  "status" : 200,
  "name" : "master",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.6.0",
    "build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
    "build_timestamp" : "2015-06-09T13:36:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

plugin

elasticsearch-head

URL: http://mobz.github.io/elasticsearch-head/

インストール

install

> bin/plugin -install mobz/elasticsearch-head

確認

下記のURLにアクセスしてへheadページが表示されることを確認します。

elasticsearch-analysis-kuromoji

URL: https://github.com/elastic/elasticsearch-analysis-kuromoji

インストール

install

> bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.6.0

elasticsearch-inquisitor

URL: https://github.com/polyfractal/elasticsearch-inquisitor

インストール

install

> bin/plugin -install polyfractal/elasticsearch-inquisitor

確認

下記のURLにアクセスしてInquisitorページが表示されることを確認します。

インストールしたプラグインの確認

> bin\plugin -l

response

Installed plugins:
    - analysis-kuromoji
    - head
    - inquisitor

基本的な検索方法の確認

サンプルデータをインデックスし何通りかの方法で検索をします。

サンプルデータの準備

サンプルデータにはテレビドラマの情報を使用します。

field	data type	description
title	文字列	原題
original_air_date	文字列	放送日
runtime	整数	放送時間(分)
guest_staring	文字列	ゲスト出演
guest_staring_role	文字列	ゲスト役柄
directed_by	文字列	監督
written_by	文字配列	脚本
teleplay	文字配列	テレビ脚本
season	整数	シーズン
no_in_season	整数	シーズン回
no_in_series	整数	放送回
japanese_title	文字列	邦題
japanese_air_date	日付	日本放送日

mapping

index: tvfile
type: columbo

サンプルデータのマッピング

columbo_mapping.json

{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "analysis": {
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "standard"
          ]
        },
        "letter_lower_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "lowercase"
          ]
        },
        "letter_upper_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "uppercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "columbo": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": true
      },
      "_timestamp": {
        "enabled": true
      },
      "dynamic": "strict",
      "properties": {
        "title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "original_air_date": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "runtime": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "guest_staring": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "guest_staring_role": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "directed_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "written_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "teleplay": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_series": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "japanese_title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "kuromoji_analyzer",
          "store": true,
          "include_in_all": true
        },
        "japanese_air_date": {
          "type": "date",
          "format": "dateHourMinuteSecond",
          "store": true,
          "include_in_all": false
        }
      }
    }
  }
}

indexの作成

上記のjsonファイルを使用してindexを作成しmappingを設定します。

POST

> curl -XPOST "localhost:9200/tvfile?pretty" -d @columbo_mapping.json

mappingの確認

GET

> curl -XGET "localhost:9200/tvfile/_settings,_mappings?pretty"

indexを削除する場合

DELETE

> curl -XDELETE "localhost:9200/tvfile"

サンプルデータ

長くなるので一部分のみ掲載します。サンプルデータ全文はこちらのページにあります。

columbo_data.json

{"index": {}}
{"title": "Prescription: Murder",                "original_air_date": "February 20, 1968",   "runtime": 98, "guest_staring": "Gene Barry",        "guest_staring_role": "Dr. Ray Fleming (Gene Barry), a psychiatrist",                                              "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": [""],                                                                     "season": 0, "no_in_season": 1, "no_in_series": 1,  "japanese_title": "殺人処方箋",                   "japanese_air_date": "1972-08-27T00:00:00"}
{"index": {}}
{"title": "Ransom for a Dead Man",               "original_air_date": "March 1, 1971",       "runtime": 98, "guest_staring": "Lee Grant",         "guest_staring_role": "Leslie Williams, a brilliant lawyer and pilot",                                             "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": ["Dean Hargrove"],                                                        "season": 0, "no_in_season": 2, "no_in_series": 2,  "japanese_title": "死者の身代金",                 "japanese_air_date": "1973-04-22T00:00:00"}
{"index": {}}
{"title": "Murder by the Book",                  "original_air_date": "September 15, 1971",  "runtime": 73, "guest_staring": "Jack Cassidy",      "guest_staring_role": "Ken Franklin is one half of a mystery writing team",                                        "directed_by": "Steven Spielberg",     "written_by": ["Steven Bochco"],                                    "teleplay": [""],                                                                     "season": 1, "no_in_season": 1, "no_in_series": 3,  "japanese_title": "構想の死角",                   "japanese_air_date": "1972-11-26T00:00:00"}

ドキュメントの登録

上記のjsonファイルを使用してドキュメントをインデックスします。

POST

> curl -XPOST "localhost:9200/tvfile/columbo/_bulk?pretty" --data-binary @columbo_data.json

ドキュメントの全削除する場合

DELETE

> curl -XDELETE "localhost:9200/tvfile/columbo?pretty"

ドキュメント件数をカウント

GET

> curl -XGET "localhost:9200/tvfile/columbo/_count?pretty" -d "{
  \"query\": {
      \"matchAll\": {}
  }
}"

response

{
  "count" : 45,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

ドキュメントの検索

検索はSearch APIsを使用します。検索方法はURI SearchとRequest Body Searchがあります。

Syntax

[host name][:port]/[index name]/[type name]/_search

検索結果に含まれる共通フィールド

field	description
`took`	検索にかかった時間(ミリ秒).
`timed_out`	検索がタイムアウトしたかどうかを真偽値で表現.
`_shards`	検索できたシャード数および検索できなかったシャード数.
`hits`	検索結果を保持
`hits.total`	検索条件に一致するドキュメントの件数.
`hits.hits`	検索結果に一致するドキュメントの配列.（デフォルトは10件)
`_score`	ドキュメントのスコア.
`max_score`	最大スコア.

The Search API

URI Search

URL Searchはリクエストパラメータに検索条件を指定して検索します。

無条件で検索

検索条件はqパラメータで指定します。

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?q=*&from=0&size=10&pretty"

条件を指定して検索

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?q=September%20Patrick&df=_all&default_operator=OR&from=0&size=10&_source=false&fields=title,original_air_date,runtime,guest_staring,directed_by,written_by,season,no_in_season&sort=season:asc,no_in_season:asc&track_scores=true&pretty"

Parameters

name default / description

q The query string.

df The default field to use when no field prefix is defined within the query.

analyzer The analyzer name.

lowercase_expanded_terms Defaults to true.

analyze_wildcard Defaults to false.

default_operator can be AND or OR. Defaults to OR.

lenient Defaults to false.

explain For each hit, contain an explanation of how scoring of the hits was computed.

_source Set to false to disable retrieval of the _source field.

fields The selective stored fields of the document to return for each hit, comma delimited.

sort Sorting to perform. Can either be in the form of fieldName, or fieldName:asc / fieldName:desc.

track_scores When sorting, set to true in order to still track scores and return them as part of each hit.

timeout Defaults to no timeout.

terminate_after The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.

from Defaults to 0.

size Defaults to 10.

search_type Defaults to query_then_fetch.

name	default / description
`q`	The query string.
`df`	The default field to use when no field prefix is defined within the query.
`analyzer`	The analyzer name.
`lowercase_expanded_terms`	Defaults to `true`.
`analyze_wildcard`	Defaults to `false`.
`default_operator`	can be `AND` or `OR`. Defaults to `OR`.
`lenient`	Defaults to `false`.
`explain`	For each hit, contain an explanation of how scoring of the hits was computed.
`_source`	Set to `false` to disable retrieval of the `_source` field.
`fields`	The selective stored fields of the document to return for each hit, comma delimited.
`sort`	Sorting to perform. Can either be in the form of `fieldName`, or `fieldName:asc` / `fieldName:desc`.
`track_scores`	When sorting, set to `true` in order to still track scores and return them as part of each hit.
`timeout`	Defaults to no timeout.
`terminate_after`	The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.
`from`	Defaults to `0`.
`size`	Defaults to `10`.
`search_type`	Defaults to `query_then_fetch`.

qに検索するキーワードを指定します。
dfに検索する対象のフィールド名を指定します。デフォルトは_allです。
_sourceにfalseを指定すると検索結果に_sourceフィールドを含めません。
fieldsに検索結果に含めたいフィールド名をカンマ区切りで指定します。
track_scoresにtrueを指定するとソート時でもスコアを計算します。(デフォルトではソートを行うとスコアを計算しません。)
from、sizeで検索するドキュメントの位置を指定できます。

Elasticsearch Reference - URI Search

Request Body Search

Request Body Searchはリクエストボディに検索条件を指定します。
検索の種類にQueriesとFiltersがあります。

これらの違いは

Queriesは全文検索および単語検索ができますが、Filtersは単語検索のみです。
Queriesはスコアを計算しますが、Filtersはスコアを計算しません。
QueriesはFiltersに比べてコストがかかります。
Queriesは検索結果をキャッシュしませんが、Filtersはキャッシュをします。

QueriesとFiltersを組み合わせて使用することもできます。

Queries

Match All Query

queryにmatchAllを指定すると、無条件での検索になります。

columbo_match_all_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}

_sourceにfalseを指定したので検索結果に_sourceフィールドは含まれません。
fromに0,sizeに100を指定したので先頭から100件まで取得します。(sizeのデフォルトは10です。)
fieldsに検索結果に含めたいフィールド名を指定します。
sortでドキュメントの並び順を指定します。

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_query.json

Elasticsearch Reference - Match All Query

Match Query

queryにmatchを指定すると、指定したフィールド(この例ではoriginal_air_date)に対してqueryで指定したキーワードを検索します。

columbo_match_query.json

{
  "_source": false,
  "from": 3,
  "size": 3,
  "query": {
    "match": {
      "original_air_date": {
        "query": "September December",
        "operator": "OR"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query.json

response

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : 1.4385337,
    "hits" : [ {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpHw",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Nicholas Colasanto" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "John Cassavetes" ],
        "original_air_date" : [ "September 17, 1972" ],
        "no_in_series" : [ 10 ],
        "runtime" : [ 98 ],
        "season" : [ 2 ],
        "title" : [ "ﾃ液ude in Black" ],
        "written_by" : [ "Richard Levinson & William Link" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpH4",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Jeannot Szwarc" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Vera Miles" ],
        "original_air_date" : [ "September 23, 1973" ],
        "no_in_series" : [ 18 ],
        "runtime" : [ 73 ],
        "season" : [ 3 ],
        "title" : [ "Lovely But Lethal" ],
        "written_by" : [ "Myrna Bercovici" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhHw7_5S8xhhpIA",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Bernard L. Kowalski" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Robert Conrad" ],
        "original_air_date" : [ "September 15, 1974" ],
        "no_in_series" : [ 26 ],
        "runtime" : [ 98 ],
        "season" : [ 4 ],
        "title" : [ "An Exercise in Fatality" ],
        "written_by" : [ "Larry Cohen" ]
      }
    } ]
  }
}

Elasticsearch Reference - Match Query

Multi Match Query

queryにmulti_matchを指定すると、fieldsで指定した複数のフィールドに対してqueryで指定したキーワードを検索します。

columbo_multi_match_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "multi_match": {
      "query": "October Patrick",
      "type": "cross_fields",
      "fields": ["original_air_date", "guest_staring"],
      "operator": "AND"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_multi_match_query.json

typeに指定できる値とその意味は下記の通りです。

Types of multi_match query

type description

best_fields default. Finds documents which match any field, but uses the _score from the best field.

most_fields Finds documents which match any field and combines the _score from each field.

cross_fields Treats fields with the same analyzer as though they were one big field. Looks for each word in any field.

phrase Runs a match_phrase query on each field and combines the _score from each field.

phrase_prefix Runs a match_phrase_prefix query on each field and combines the _score from each field.

type	description
`best_fields`	default. Finds documents which match any field, but uses the `_score` from the best field.
`most_fields`	Finds documents which match any field and combines the `_score` from each field.
`cross_fields`	Treats fields with the same analyzer as though they were one big field. Looks for each word in any field.
`phrase`	Runs a match_phrase query on each field and combines the `_score` from each field.
`phrase_prefix`	Runs a match_phrase_prefix query on each field and combines the `_score` from each field.

Elasticsearch Reference - Multi Match Query

Query String Query

queryにquery_stringを指定すると、他のQueryより複雑な条件指定が可能になります。

columbo_query_string.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "query_string": {
      "fields" : ["_all"],
      "query": "(September OR Patrick) AND (season:5)"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_string.json

default_field

クエリで検索するフィールドを明示的に指定しない場合に参照されるフィールドです。
デフォルトは_allフィールドになります。

別のフィールドを指定する場合は

example

{
  "settings": {
    "index": {
      "query": {
        "default_field": "_all"
      }
    }
  }
}

Elasticsearch Reference - Query String Query

Simple Query String Query

simple_query_stringはquery_stringの簡易版です。

columbo_simple_query_string.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "simple_query_string" : {
      "query": "(September | October | November) +(McGoohan)",
      "fields": ["_all"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_simple_query_string.json

Available flags

flag description

ALL

NONE

AND +

OR `

NOT -

PREFIX *

PHRASE "

PRECEDENCE ( and )

ESCAPE

WHITESPACE

FUZZY ~N after a word

NEAR

SLOP ~N after a phrase

flag	description
`ALL`
`NONE`
`AND`	`+`
`OR`	`
`NOT`	`-`
`PREFIX`	`*`
`PHRASE`	`"`
`PRECEDENCE`	`(` and `)`
`ESCAPE`
`WHITESPACE`
`FUZZY`	`~N` after a word
`NEAR`
`SLOP`	`~N` after a phrase

Elasticsearch Reference - Simple Query String Query

Term Query

queryにtermを指定すると、termで指定するフィールドの値と完全に一致するドキュメントを検索します。

columbo_term_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_query.json

Elasticsearch Reference - Term Query

Bool Query

queryにboolを指定すると、複数のqueryを組み合わせて検索することができます。

columbo_bool_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "bool": {
      "must": {
        "match": {
          "_all": {
            "query": "September Patrick",
            "operator": "OR"
          }
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_query.json

The occurrence types

occur description

must The clause (query) must appear in matching documents.

should The clause (query) should appear in the matching document.

must_not The clause (query) must not appear in the matching documents.

occur	description
`must`	The clause (query) must appear in matching documents.
`should`	The clause (query) should appear in the matching document.
`must_not`	The clause (query) must not appear in the matching documents.

shouldを指定をした場合、minimum_should_matchパラメータで最小のマッチ数を指定できます。

Elasticsearch Reference - Bool Query

Range Query

queryにrangeを指定すると、rangeで指定するフィールドの値で範囲検索することができます。

columbo_range_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_query.json

Elasticsearch Reference - Range Query

Ids Query

queryにidsを指定すると、_idフィールドの値で検索することができます。

columbo_ids_query.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_query.json

Elasticsearch Reference - Ids Query

Filters

Match All Filter

columbo_match_all_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_filter.json

Elasticsearch Reference - Match All Filter

Query Filter

columbo_query_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "query": {
      "query_string" : {
        "fields" : ["_all"],
        "query": "(September OR Patrick) AND (season:5)"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_filter.json

Elasticsearch Reference - Query Filter

Term Filter

columbo_term_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_filter.json

Elasticsearch Reference - Term Filter

Bool Filter

columbo_bool_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "bool": {
      "must": {
        "term": {
          "no_in_season": {"value": 1}
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_filter.json

Elasticsearch Reference - Bool Filter

Range Filter

columbo_range_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_filter.json

Elasticsearch Reference - Range Filter

Ids Filter

columbo_ids_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ]
}

_idの値を自動生成する場合、ドキュメントを登録するたびに_id値が変わりますので上記のjsonをそのまま使用して検索しても結果は得られません。

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_filter.json

Elasticsearch Reference - Ids Filter

QueryとFilterの組み合わせ

columbo_match_query_range_filter.json

{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "match": {
      "_all": {
        "query": "September Patrick",
        "operator": "OR"
      }
    }
  },
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 30, "lte": 39}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}

GET

> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query_range_filter.json

クエリパラメータ

query parameters

parameter default / description

timeout Defaults to no timeout.

from Defaults to 0.

size Defaults to 10.

search_type Defaults to query_then_fetch.

query_cache Set to true or false to enable or disable the caching of search results.

terminate_after Defaults to no terminate_after. [experimental]

parameter	default / description
`timeout`	Defaults to no timeout.
`from`	Defaults to `0`.
`size`	Defaults to `10`.
`search_type`	Defaults to `query_then_fetch`.
`query_cache`	Set to `true` or `false` to enable or disable the caching of search results.
`terminate_after`	Defaults to no terminate_after. `[experimental]`

timeoutはタイムアウトする時間を文字列で指定します。指定できる単位は下記のTime unitsにある通りです。
search_typeとquery_cacheはクエリパラメータで指定します。

Time units

unit	description
`y`	Year
`M`	Month
`w`	Week
`d`	Day
`h`	Hour
`m`	Minute
`s`	Second

Elasticsearch Reference - Request Body Search

Elasticsearchの仕様メモ

mapping

Fields

ドキュメントのマッピングで使用できるフィールド

field default / description

_uid Each document indexed is associated with an id and a type, the internal _uid field is the unique identifier of a document within an index and is composed of the type and the id.

_id By default it is not indexed and not stored (thus, not created).

_type By default, the _type field is indexed (but not analyzed) and not stored.

_source The _source field is an automatically generated field that stores the actual JSON that was used as the indexed document.

_all The idea of the _all field is that it includes the text of one or more other fields within the document indexed.

_analyzer Deprecated in 1.5.0.

_boost Deprecated in 1.0.0.RC1.

_parent The parent field mapping is defined on a child mapping, and points to the parent type this child relates to.

_field_names The _field_names field indexes the field names of a document, which can later be used to search for documents based on the fields that they contain typically using the exists and missing filters.

_routing The routing field allows to control the _routing aspect when indexing data and explicit routing control is required.

_index By default it is disabled.

_size By default it is disabled.

_timestamp By default it is disabled.

_ttl By default it is disabled.

field	default / description
`_uid`	Each document indexed is associated with an id and a type, the internal `_uid` field is the unique identifier of a document within an index and is composed of the type and the id.
`_id`	By default it is not indexed and not stored (thus, not created).
`_type`	By default, the _type field is indexed (but not analyzed) and not stored.
`_source`	The `_source` field is an automatically generated field that stores the actual JSON that was used as the indexed document.
`_all`	The idea of the `_all` field is that it includes the text of one or more other fields within the document indexed.
`_analyzer`	Deprecated in 1.5.0.
`_boost`	Deprecated in 1.0.0.RC1.
`_parent`	The parent field mapping is defined on a child mapping, and points to the parent type this child relates to.
`_field_names`	The _field_names field indexes the field names of a document, which can later be used to search for documents based on the fields that they contain typically using the exists and missing filters.
`_routing`	The routing field allows to control the _routing aspect when indexing data and explicit routing control is required.
`_index`	By default it is disabled.
`_size`	By default it is disabled.
`_timestamp`	By default it is disabled.
`_ttl`	By default it is disabled.

Elasticsearch Reference - Fields

Types

ドキュメントのマッピングで使用できるデータタイプ

Elasticsearch Reference - Types

Core Types

string

attributes

attribute default / description

index_name Defaults to the property/field name.

store Defaults to false.

index Defaults to analyzed. not_analyzed, no

doc_values Set to true to store field values in a column-stride fashion.

term_vector Defaults to no.

boost Defaults to 1.0.

null_value Defaults to not adding the field at all.

norms: {enabled: <value>} Defaults to true for analyzed fields, and to false for not_analyzed fields.

norms: {loading: <value>} possible values are eager and lazy (default).

index_options Defaults to positions for analyzed fields, and to docs for not_analyzed fields.

analyzer Defaults to the globally configured analyzer.

index_analyzer The analyzer used to analyze the text contents when analyzed during indexing.

search_analyzer The analyzer used to analyze the field when part of a query string.

include_in_all If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.

ignore_above The analyzer will ignore strings larger than this size.

position_offset_gap Defaults to 0.

attribute	default / description
`index_name`	Defaults to the property/field name.
`store`	Defaults to `false`.
`index`	Defaults to `analyzed`. `not_analyzed`, `no`
`doc_values`	Set to true to store field values in a column-stride fashion.
`term_vector`	Defaults to `no`.
`boost`	Defaults to `1.0`.
`null_value`	Defaults to not adding the field at all.
`norms: {enabled: <value>}`	Defaults to true for analyzed fields, and to false for not_analyzed fields.
`norms: {loading: <value>}`	possible values are eager and lazy (default).
`index_options`	Defaults to positions for analyzed fields, and to docs for not_analyzed fields.
`analyzer`	Defaults to the globally configured analyzer.
`index_analyzer`	The analyzer used to analyze the text contents when analyzed during indexing.
`search_analyzer`	The analyzer used to analyze the field when part of a query string.
`include_in_all`	If index is set to `no` this defaults to `false`, otherwise, defaults to `true` or to the parent object type setting.
`ignore_above`	The analyzer will ignore strings larger than this size.
`position_offset_gap`	Defaults to `0`.

copy_to

copy_toを使用すると別のフィールドへ値をコピーすることができます。

example

{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "copy_to": "contents"
    },
    "contents": {
      "type": "string"
    }
  }
}

fields

multi_fieldタイプはversion 1.0でCore Typesから削除されました。
fieldsを使用することで1つのJSONソースフィールドを複数のフィールドへマップすることができます。

example

{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

Number

数値型にはfloat,double,byte,short,integer,longがあります。

attributes

attribute default / description

type float, double, integer, long, short, byte. Required.

index_name Defaults to the property/field name.

store Defaults to false.

index Set to no if the value should not be indexed. Setting to no disables include_in_all.

doc_values Set to true to store field values in a column-stride fashion.

precision_step Defaults to 16 for long, double, 8 for short, integer, float, 2147483647 for byte.

boost Defaults to 1.0.

null_value Defaults to not adding the field at all.

include_in_all If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.

ignore_malformed Defaults to false.

coerce Defaults to true.

attribute	default / description
`type`	`float`, `double`, `integer`, `long`, `short`, `byte`. Required.
`index_name`	Defaults to the property/field name.
`store`	Defaults to `false`.
`index`	Set to `no` if the value should not be indexed. Setting to no disables `include_in_all`.
`doc_values`	Set to `true` to store field values in a column-stride fashion.
`precision_step`	Defaults to 16 for `long`, `double`, 8 for `short`, `integer`, `float`, 2147483647 for `byte`.
`boost`	Defaults to `1.0`.
`null_value`	Defaults to not adding the field at all.
`include_in_all`	If index is set to `no` this defaults to `false`, otherwise, defaults to `true` or to the parent object type setting.
`ignore_malformed`	Defaults to `false`.
`coerce`	Defaults to `true`.

Date

attributes

attribute description

index_name Defaults to the property/field name.

format Defaults to dateOptionalTime.

store Defaults to false.

index Set to no if the value should not be indexed. Setting to no disables include_in_all.

doc_values Set to true to store field values in a column-stride fashion.

precision_step Defaults to 16.

boost Defaults to 1.0.

null_value Defaults to not adding the field at all.

include_in_all If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.

ignore_malformed Defaults to false.

numeric_resolution Possible values include seconds and milliseconds (default).

attribute	description
`index_name`	Defaults to the property/field name.
`format`	Defaults to `dateOptionalTime`.
`store`	Defaults to `false`.
`index`	Set to `no` if the value should not be indexed. Setting to no disables `include_in_all`.
`doc_values`	Set to `true` to store field values in a column-stride fashion.
`precision_step`	Defaults to `16`.
`boost`	Defaults to `1.0`.
`null_value`	Defaults to not adding the field at all.
`include_in_all`	If index is set to `no` this defaults to `false`, otherwise, defaults to `true` or to the parent object type setting.
`ignore_malformed`	Defaults to `false`.
`numeric_resolution`	Possible values include seconds and milliseconds (default).

Boolean

attributes

attribute default / description

index_name Defaults to the property/field name.

store Defaults to false.

index Set to no if the value should not be indexed. Setting to no disables include_in_all.

boost Defaults to 1.0.

null_value Defaults to not adding the field at all.

attribute	default / description
`index_name`	Defaults to the property/field name.
`store`	Defaults to `false`.
`index`	Set to `no` if the value should not be indexed. Setting to no disables `include_in_all`.
`boost`	Defaults to `1.0`.
`null_value`	Defaults to not adding the field at all.

Binary

attributes

attribute default / description

index_name Defaults to the property/field name.

store Defaults to false.

doc_values Set to true to store field values in a column-stride fashion.

compress Set to true to compress the stored binary value.

compress_threshold Defaults to -1

attribute	default / description
`index_name`	Defaults to the property/field name.
`store`	Defaults to `false`.
`doc_values`	Set to `true` to store field values in a column-stride fashion.
`compress`	Set to `true` to compress the stored binary value.
`compress_threshold`	Defaults to `-1`

Root Object Type

Root Object Type

type default / description

dynamic_date_formats dynamic_date_formats is the ability to set one or more date formats that will be used to detect date fields.

date_detection Allows to disable automatic date type detection.

numeric_detection Sometimes, even though json has support for native numeric types, numeric values are still provided as strings.

dynamic_templates Dynamic templates allow to define mapping templates that will be applied when dynamic introduction of fields / objects happens.

type	default / description
`dynamic_date_formats`	`dynamic_date_formats` is the ability to set one or more date formats that will be used to detect date fields.
`date_detection`	Allows to disable automatic date type detection.
`numeric_detection`	Sometimes, even though json has support for native numeric types, numeric values are still provided as strings.
`dynamic_templates`	Dynamic templates allow to define mapping templates that will be applied when dynamic introduction of fields / objects happens.

Elasticsearch Reference - Root Object Type

Date Format

Built in Formatsの抜粋

format pattern expected

basic_date yyyyMMdd 20060102

basic_date_time yyyyMMdd'T'HHmmss.SSSZ 20060102T150405.999+0900

basic_date_time_no_millis yyyyMMdd'T'HHmmssZ 20060102T150405+0900

date yyyy-MM-dd 2006-01-02

date_time yyyy-MM-dd'T'HH:mm:ss.SSSZZ 2006-01-02T15:04:05.999+09:00

date_time_no_mills yyyy-MM-dd'T'HH:mm:ssZZ 2006-01-02T15:04:05+09:00

date_optional_time yyyy-MM-dd 2006-01-02

date_optional_time yyyy-MM-dd'T'HH:mm:ss 2006-01-02T15:04:05

date_hour_minute_second yyyy-MM-dd'T'HH:mm:ss 2006-01-02T15:04:05

date_hour_minute_second_millis yyyy-MM-dd'T'HH:mm:ss.SSS 2006-01-02T15:04:05.999

format	pattern	expected
`basic_date`	`yyyyMMdd`	20060102
`basic_date_time`	`yyyyMMdd'T'HHmmss.SSSZ`	20060102T150405.999+0900
`basic_date_time_no_millis`	`yyyyMMdd'T'HHmmssZ`	20060102T150405+0900

`date`	`yyyy-MM-dd`	2006-01-02
`date_time`	`yyyy-MM-dd'T'HH:mm:ss.SSSZZ`	2006-01-02T15:04:05.999+09:00
`date_time_no_mills`	`yyyy-MM-dd'T'HH:mm:ssZZ`	2006-01-02T15:04:05+09:00
`date_optional_time`	`yyyy-MM-dd`	2006-01-02
`date_optional_time`	`yyyy-MM-dd'T'HH:mm:ss`	2006-01-02T15:04:05

`date_hour_minute_second`	`yyyy-MM-dd'T'HH:mm:ss`	2006-01-02T15:04:05
`date_hour_minute_second_millis`	`yyyy-MM-dd'T'HH:mm:ss.SSS`	2006-01-02T15:04:05.999

Elasticsearch Reference - mapping-data-formt

Analysis

Analyzerは1個のTokenizerと0個以上のToken Filterの組み合わせです。

example

{
  "settings": {
    "analysis": {
      "analyzer": {
        "{analyzer論理名}": {
          "type": "使用するanalyzerの指定",
          "使用するanalyzer固有の設定"
        },
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform",
            "kuromoji_pos_filter"
          ]
        },
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "my_filter"
          ],
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "tokenizer": {
        "{tokenizer論理名}": {
          "type": "使用するtokenizerの指定",
          "使用するtokenizer固有の設定"
        },
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "my_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "filter": {
        "{filter論理名}": {
          "type": "使用するfilterの指定",
          "使用するfilter固有の設定"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        },
        "my_filter": {
          "type": "stop",
          "stopwords": ["NGWORD_A", "NGWORD_B", "NGWORD_C"]
        }
      },
      "char_filter": {
        "{char_filter論理名}": {
          "type": "使用するchar_filterの指定",
          "使用するchar_filter固有の設定"
        },
        "my_char_filter": {
          "type": "mapping",
          "mappings" : ["kb=>kilobyte","mb=>megabyte","gb=>gigabyte"]
        }
      }
    },
    "index": {
      "indexの設定"
    }
  },
  "mappings": {
    "{type名}": {
      "typeの設定"
    },
    "{type名}": {
      "typeの設定"
    }
  }
}

Analyzers

Built in Analyzers

Analyzers	type	description
Standard Analyzer	`standard`	`Standard Tokenizer`,`the Standard Token Filter`,`Lower Case Token Filter`,`Stop Token Filter`から構成されるanalyzer.
Simple Analyzer	`simple`	`Lower Case Tokenizer`から構成されるanalyzer.
Whitespace Analyzer	`whitespace`	`Whitespace Tokenizer`から構成されるanalyzer.
Stop Analyzer	`stop`	`Lower Case Tokenizer`と `Stop Token Filter`から構成されるanalyzer.
Keyword Analyzer	`keyword`	与えられた文字全体を1つのトークンとして処理するanalyzer.
Pattern Analyzer	`pattern`	正規表現を使用するanalyzer.
Language Analyzers	下表参照	特定言語用のanalyzer.
Snowball Analyzer	`snowball`	`standard tokenizer`, `standard filter`, `lowercase filter`, `stop filter`, `snowball filter`から構成されるanalyzer.
Custom Analyzer	`custom`	任意のTokenizer, 0個以上の任意のToken Filters, 0個以上の任意のChar Filtersを組み合わせて構成するanalyzer.

The following types are supported

type	language
`arabic`	アラビア語
`armenian`	アルメニア語
`basque`	バスク語
`brazilian`	ポルトガル語（ブラジル）
`bulgarian`	ブルガリア語
`catalan`	カタロニア語
`chinese`	中国語
`cjk`	CJK統合漢字
`czech`	チェコ語
`danish`	デンマーク語
`dutch`	オランダ語
`english`	英語
`finnish`	フィンランド語
`french`	フランス語
`galician`	ガリシア語
`german`	ドイツ語
`greek`	ギリシャ語
`hindi`	ヒンディー語
`hungarian`	ハンガリー語
`indonesian`	インドネシア語
`irish`	アイルランド語
`italian`	イタリア語
`latvian`	ラトビア語
`norwegian`	ノルウェー語
`persian`	ペルシャ語
`portuguese`	ポルトガル語
`romanian`	ルーマニア語
`russian`	ロシア語
`sorani`	クルド語(?)
`spanish`	スペイン語
`swedish`	スウェーデン語
`turkish`	トルコ語
`thai`	タイ語

Custom Analyzerの設定サンプル

kuromojiの設定を例にしたCustom Analyzerの設定サンプルです。

example

{
  "settings": {
    "analysis": {
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        }
      },
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        }
      }
    }
  }
}

kuromoji_tokenizerは、kuromojiのbuilt in tokenizerです。
kuromoji_baseform、kuromoji_part_of_speechは、kuromojiのbuilt in token filterです。

Setting	Description
`tokenizer`	使用するtokenizerの名前.
`filter`	オプション. 使用するtoken filterの名前のリスト.
`char_filter`	オプション. 使用するchar filterの名前のリスト.
`position_offset_gap`	An optional number of positions to increment between each field value of a field using this analyzer.

Elasticsearch Reference - Analyzers

Tokenizers

Built in Tokenizers

Tokenizer	type	description
Standard Tokenizer	`standard`	European language向けのtokenizer.
Edge NGram Tokenizer	`edgeNGram`	nGramを使ってテキストをトークンに分割するtokenizer.
Keyword Tokenizer	`keyword`	テキストを単一のトークンとして扱うtokenizer.
Letter Tokenizer	`letter`	non-lettersでテキストをトークンに分割するtokenizer.
Lowercase Tokenizer	`lowercase`	`Letter Tokenizer`と`Lower Case Token Filter`を一緒に使用するのと同じ機能
NGram Tokenizer	`nGram`	nGramを使ってテキストをトークンに分割するtokenizer.
Whitespace Tokenizer	`whitespace`	半角スペースでテキストをトークンに分割するtokenizer.
Pattern Tokenizer	`pattern`	正規表現でテキストをトークンに分割するtokenizer.
UAX Email URL Tokenizer	`uax_url_email`	URLやメールアドレスをトークンに分割するtokenizer.
Path Hierarchy Tokenizer	`path_hierarchy`	パスの構造をトークンにするtokenizer.(パス区切り文字で分割するわけでありません)
Classic Tokenizer	`classic`	英文向けのtokenizer.
Thai Tokenizer	`thai`	タイ語向け?のtokenizer.

edgeNGramとnGram

elasticというテキストをnGramとedgeNGramでトークンにした場合の結果

min_gram: 2
max_gram: 3
token_chars: letter,digit

nGram

position	type	token
1	word	`el`
2	word	`ela`
3	word	`la`
4	word	`las`
5	word	`as`
6	word	`ast`
7	word	`st`
8	word	`sti`
9	word	`ti`
10	word	`tic`
11	word	`ic`

edgeNGram

position	type	token
1	word	`el`
2	word	`ela`

uax_url_email

Elasticsearch reference <a href =\"https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html\">UAX Email URL Tokenizer</a>というテキストをトークンにした場合の結果

position	type	token
1	ALPHANUM	`Elasticsearch`
2	ALPHANUM	`reference`
3	ALPHANUM	`a`
4	ALPHANUM	`href`
5	URL	`https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html`
6	ALPHANUM	`UAX`
7	ALPHANUM	`Email`
8	ALPHANUM	`URL`
9	ALPHANUM	`Tokenizer`
10	ALPHANUM	`a`

path_hierarchy

C:/Windows/System32/drivers/etcというテキストをトークンにした場合の結果

position	type	token
1	word	`C:`
1	word	`C:/Windows`
1	word	`C:/Windows/System32`
1	word	`C:/Windows/System32/drivers`
1	word	`C:/Windows/System32/drivers/etc`

Elasticsearch Reference - Tokenizers

Token Filters

Token Filter	type
Standard Token Filter	`standard`
ASCII Folding Token Filter	`asciifolding`
Length Token Filter	`length`
Lowercase Token Filter	`lowercase`
Uppercase Token Filter	`uppercase`
NGram Token Filter	`nGram`
Edge NGram Token Filter	`edgeNGram`
Porter Stem Token Filter	`porter_stem`
Shingle Token Filter	`shingle`
Stop Token Filter	`stop`
Word Delimiter Token Filter	`word_delimiter`
Stemmer Token Filter	`stemmer`
Stemmer Override Token Filter	`stemmer_override`
Keyword Marker Token Filter	`keyword_marker`
Keyword Repeat Token Filter	`keyword_repeat`
KStem Token Filter	`kstem`
Snowball Token Filter	`snowball`
Phonetic Token Filter	`phonetic`
Synonym Token Filter	`synonym`
Compound Word Token Filter	`dictionary_decompounder`, `hyphenation_decompounder`
Reverse Token Filter	`reverse`
Elision Token Filter	`elision`
Truncate Token Filter	`truncate`
Unique Token Filter	`unique`
Pattern Capture Token Filter	`pattern_capture`
Pattern Replace Token Filter	`pattern_replace`
Trim Token Filter	`trim`
Limit Token Count Token Filter	`limit`
Hunspell Token Filter	`hunspell`
Common Grams Token Filter	`common_grams`
Normalization Token Filter	下表参照
CJK Width Token Filter	`cjk_width`
CJK Bigram Token Filter	`cjk_bigram`
Delimited Payload Token Filter	`delimited_payload_filter`
Keep Words Token Filter	`keep`
Keep Types Token Filter	`keep_types`
Classic Token Filter	`classic`
Apostrophe Token Filter	`apostrophe`

Normalization Token Filter

language	type
Arabic	`arabic_normalization`
German	`german_normalization`
Hindi	`hindi_normalization`
Indic	`indic_normalization`
Kurdish (Sorani)	`sorani_normalization`
Persian	`persian_normalization`
Scandinavian	`scandinavian_normalization`, `scandinavian_folding`

Elasticsearch Reference - Token Filters

Character Filters

Character Filter	type
Mapping Char Filter	`mapping`
Mapping Char Filter	`html_strip`
Pattern Replace Char Filter	`pattern_replace`

Elasticsearch Reference - Character Filters

Analyze

Indices APIsの_analyzeを使用するとanalyzerの結果を確認することができます。Elasticsearchのbuilt in analyzerであればインデックスを指定する必要がありません。

Syntax

> curl -XGET "[host name][:port]/[index name]/_analyze?analyzer={analyzer name}&tokenizer={tokenizer name}&token_filters={}&char_filters={}" -d "The Bye-Bye Sky High IQ Murder Case"

standard

example

> curl -XGET "localhost:9200/_analyze?analyzer=standard" -d "The Bye-Bye Sky High IQ Murder Case"

analyzerの挙動を詳しく確認したい場合は、indexに設定した方がよいようです。
下記はtestというindexに確認したanalyzerを設定し、そのanalyzerを使用する例です。

PUT

curl -XPUT "localhost:9200/test?pretty" -d "{
  \"settings\": {
    \"analysis\": {
      \"analyzer\": {
        "my_analyzer": {
          \"type\": \"custom\",
          \"tokenizer\": \"my_tokenizer\"
        }
      },
      \"tokenizer\": {
        \"my_tokenizer\": {
          \"type\": \"path_hierarchy\",
          \"reverse\": false,
          \"skip\": 0
        }
      }
    }
  }
}"

GET

curl -XGET "localhost:9200/test/_analyze?analyzer=my_analyzer&pretty" -d "C:/Windows/System32/drivers/etc"

Elasticsearch Reference - Analyze

Elasticsearch 1.6.0 - Installing on Windows 7

概要

インストール

Elasticsearch

plugin

elasticsearch-head

elasticsearch-analysis-kuromoji

elasticsearch-inquisitor

インストールしたプラグインの確認

基本的な検索方法の確認

サンプルデータの準備

mapping

サンプルデータ

ドキュメントの検索

URI Search

無条件で検索

条件を指定して検索

Request Body Search

Queries

Match All Query

Match Query

Multi Match Query

Query String Query

Simple Query String Query

Term Query

Bool Query

Range Query

Ids Query

Filters

Match All Filter

Query Filter

Term Filter

Bool Filter

Range Filter

Ids Filter

QueryとFilterの組み合わせ

クエリパラメータ

Elasticsearchの仕様メモ

mapping

Fields

Types

Core Types

string

Number

Date

Boolean

Binary

Root Object Type

Date Format

Analysis

Analyzers

Custom Analyzerの設定サンプル

Tokenizers

Token Filters

Character Filters

Analyze