概要
Windows7に開発・検証目的用にElasticsearch 1.6.0をインストールし初歩的な設定を行います。
その次にサンプルデータを登録して初歩的な検索方法の確認を行います。
環境
この記事の内容は下記のバージョンで動作確認を行いました。
- Windows7 (64bit)
- Java 1.8.0_45
- [Elasticsearch] (https://www.elastic.co/) 1.6.0
Windowsにはcurlコマンドがありませんので[cURL] (http://curl.haxx.se/)を使用しました。
参考
下記のサイトを参考にさせて頂きました。
Elasticsearch
- [Elasticsearch] (https://www.elastic.co/products/elasticsearch)
- [Elasticsearch Reference 1.6] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/index.html)
Slide
- [ElasticsearchとKibanaではじめる検索&アナリティクス] (https://speakerdeck.com/johtani/elasticsearchtokibanadehazimerujian-suo-anariteikusu)
- [Terms of endearment - the ElasticSearch Query DSL explained] (http://www.slideshare.net/clintongormley/terms-of-endearment-the-elasticsearch-query-dsl-explained)
Blog
- [Elasticsearchチュートリアル - 不可視点] (http://code46.hatenablog.com/entry/2014/01/21/115620)
- [実践!Elasticsearch - Wantedly Engineer Blog] (http://engineer.wantedly.com/2014/02/25/elasticsearch-at-wantedly-1.html)
- [Elasticsearchとkuromojiでちゃんとした日本語全文検索をやるメモ - GMOメディア エンジニアブログ] (http://tech.gmo-media.jp/post/70245090007/elasticsearch-kuromoji-japanese-fulltext-search)
- [elasticsearch - DRYな備忘録] (http://otiai10.hatenablog.com/archive/category/elasticsearch)
- [勉強会メモ - 第8回elasticsearch勉強会 - よしだのブログ] (http://blog.yoslab.com/entry/2015/02/13/203251)
- [All About Analyzers, Part One] (https://www.found.no/foundation/text-analysis-part-1/)
Qiita
- [Kibana 4.1.0 + ElasticSearch 1.6.0 でデータビジュアライズ] (http://qiita.com/hiyuzawa/items/bad1a7e29fc8d1820bea)
- [Kibana+Elasticsearchで文字列の完全一致と部分一致検索の両方を実現する] (http://qiita.com/harukasan/items/4ec517d8d96f557367e1)
- [Elasticsearch CheatSheet] (http://qiita.com/ikawaha/items/228ee3f481e9636b3065)
インストール
Windows7にElasticsearchといくつかpluginをインストールします。
Elasticsearch
ダウンロードページよりアーカイブファイルをダウンロードし適当な場所に展開します。
ダウンロードしたアーカイブファイルはelasticsearch-1.6.0.zip
です。
展開
インストールはアーカイブファイルを適当な場所へ展開するだけで済みます。
D:\dev\elasticsearch-1.6.0
へ展開しました。
設定
開発・検証用なので最小リソースで起動するように設定します。
設定ファイルは展開したディレクトリのconf/elasticsearch.yml
です。
下記に変更点のみ抜粋します。
# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch
# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: master
# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true
-
node.name
: node名を指定しない場合はElasticsearchインスタンスの起動時に自動的に命名されます。
# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
index.number_of_shards: 1
index.number_of_replicas: 0
- 開発用なのでシャード数は1で、レプリカは行わないようにします。
# Set this property to true to lock the memory:
#
bootstrap.mlockall: true
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]
起動
展開したディレクトリへ移動して下記のコマンドを実行します。
オプションで使用するメモリサイズを指定することができます。
> bin/elasticsearch.bat -Xmx256m -Xms256m
動作確認
curlかブラウザで下記のURLへアクセスしstatus 200のレスポンスが返ってくることを確認します。
> curl -XGET "localhost:9200/"
{
"status" : 200,
"name" : "master",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.6.0",
"build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
"build_timestamp" : "2015-06-09T13:36:34Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
plugin
elasticsearch-head
URL: http://mobz.github.io/elasticsearch-head/
インストール
> bin/plugin -install mobz/elasticsearch-head
確認
下記のURLにアクセスしてへheadページが表示されることを確認します。
elasticsearch-analysis-kuromoji
URL: https://github.com/elastic/elasticsearch-analysis-kuromoji
インストール
> bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.6.0
elasticsearch-inquisitor
URL: https://github.com/polyfractal/elasticsearch-inquisitor
インストール
> bin/plugin -install polyfractal/elasticsearch-inquisitor
確認
下記のURLにアクセスしてInquisitorページが表示されることを確認します。
インストールしたプラグインの確認
> bin\plugin -l
Installed plugins:
- analysis-kuromoji
- head
- inquisitor
基本的な検索方法の確認
サンプルデータをインデックスし何通りかの方法で検索をします。
サンプルデータの準備
サンプルデータにはテレビドラマの情報を使用します。
field | data type | description |
---|---|---|
title | 文字列 | 原題 |
original_air_date | 文字列 | 放送日 |
runtime | 整数 | 放送時間(分) |
guest_staring | 文字列 | ゲスト出演 |
guest_staring_role | 文字列 | ゲスト役柄 |
directed_by | 文字列 | 監督 |
written_by | 文字配列 | 脚本 |
teleplay | 文字配列 | テレビ脚本 |
season | 整数 | シーズン |
no_in_season | 整数 | シーズン回 |
no_in_series | 整数 | 放送回 |
japanese_title | 文字列 | 邦題 |
japanese_air_date | 日付 | 日本放送日 |
mapping
index: tvfile
type: columbo
サンプルデータのマッピング
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"analysis": {
"filter": {
"greek_lowercase_filter": {
"type": "lowercase",
"language": "greek"
},
"kuromoji_pos_filter": {
"type": "kuromoji_part_of_speech"
}
},
"tokenizer": {
"kuromoji": {
"type": "kuromoji_tokenizer"
},
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": ["letter", "digit"]
}
},
"analyzer": {
"kuromoji_analyzer": {
"type": "custom",
"tokenizer": "kuromoji",
"filter": [
"kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
]
},
"ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"standard"
]
},
"letter_lower_analyzer": {
"type": "custom",
"tokenizer": "letter",
"filter": [
"lowercase"
]
},
"letter_upper_analyzer": {
"type": "custom",
"tokenizer": "letter",
"filter": [
"uppercase"
]
}
}
}
},
"mappings": {
"columbo": {
"_source": {
"enabled": true
},
"_all": {
"enabled": true
},
"_timestamp": {
"enabled": true
},
"dynamic": "strict",
"properties": {
"title": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"original_air_date": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"runtime": {
"type": "integer",
"store": true,
"include_in_all": false
},
"guest_staring": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"guest_staring_role": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"directed_by": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"written_by": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"teleplay": {
"type": "string",
"index": "analyzed",
"analyzer": "letter_lower_analyzer",
"store": true,
"include_in_all": true
},
"season": {
"type": "integer",
"store": true,
"include_in_all": false
},
"no_in_season": {
"type": "integer",
"store": true,
"include_in_all": false
},
"no_in_series": {
"type": "integer",
"store": true,
"include_in_all": false
},
"japanese_title": {
"type": "string",
"index": "analyzed",
"analyzer": "kuromoji_analyzer",
"store": true,
"include_in_all": true
},
"japanese_air_date": {
"type": "date",
"format": "dateHourMinuteSecond",
"store": true,
"include_in_all": false
}
}
}
}
}
indexの作成
上記のjsonファイルを使用してindexを作成しmappingを設定します。
> curl -XPOST "localhost:9200/tvfile?pretty" -d @columbo_mapping.json
mappingの確認
> curl -XGET "localhost:9200/tvfile/_settings,_mappings?pretty"
indexを削除する場合
> curl -XDELETE "localhost:9200/tvfile"
サンプルデータ
長くなるので一部分のみ掲載します。サンプルデータ全文は[こちらのページ] (http://qiita.com/rubytomato@github/private/700be487ddb7221c29cc)にあります。
{"index": {}}
{"title": "Prescription: Murder", "original_air_date": "February 20, 1968", "runtime": 98, "guest_staring": "Gene Barry", "guest_staring_role": "Dr. Ray Fleming (Gene Barry), a psychiatrist", "directed_by": "Richard Irving", "written_by": ["Richard Levinson & William Link"], "teleplay": [""], "season": 0, "no_in_season": 1, "no_in_series": 1, "japanese_title": "殺人処方箋", "japanese_air_date": "1972-08-27T00:00:00"}
{"index": {}}
{"title": "Ransom for a Dead Man", "original_air_date": "March 1, 1971", "runtime": 98, "guest_staring": "Lee Grant", "guest_staring_role": "Leslie Williams, a brilliant lawyer and pilot", "directed_by": "Richard Irving", "written_by": ["Richard Levinson & William Link"], "teleplay": ["Dean Hargrove"], "season": 0, "no_in_season": 2, "no_in_series": 2, "japanese_title": "死者の身代金", "japanese_air_date": "1973-04-22T00:00:00"}
{"index": {}}
{"title": "Murder by the Book", "original_air_date": "September 15, 1971", "runtime": 73, "guest_staring": "Jack Cassidy", "guest_staring_role": "Ken Franklin is one half of a mystery writing team", "directed_by": "Steven Spielberg", "written_by": ["Steven Bochco"], "teleplay": [""], "season": 1, "no_in_season": 1, "no_in_series": 3, "japanese_title": "構想の死角", "japanese_air_date": "1972-11-26T00:00:00"}
ドキュメントの登録
上記のjsonファイルを使用してドキュメントをインデックスします。
> curl -XPOST "localhost:9200/tvfile/columbo/_bulk?pretty" --data-binary @columbo_data.json
ドキュメントの全削除する場合
> curl -XDELETE "localhost:9200/tvfile/columbo?pretty"
ドキュメント件数をカウント
> curl -XGET "localhost:9200/tvfile/columbo/_count?pretty" -d "{
\"query\": {
\"matchAll\": {}
}
}"
{
"count" : 45,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
}
}
ドキュメントの検索
検索はSearch APIs
を使用します。検索方法はURI Search
とRequest Body Search
があります。
[host name][:port]/[index name]/[type name]/_search
検索結果に含まれる共通フィールド
field | description |
---|---|
took |
検索にかかった時間(ミリ秒). |
timed_out |
検索がタイムアウトしたかどうかを真偽値で表現. |
_shards |
検索できたシャード数および検索できなかったシャード数. |
hits |
検索結果を保持 |
hits.total |
検索条件に一致するドキュメントの件数. |
hits.hits |
検索結果に一致するドキュメントの配列.(デフォルトは10件) |
_score |
ドキュメントのスコア. |
max_score |
最大スコア. |
[The Search API] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/_the_search_api.html)
URI Search
URL Search
はリクエストパラメータに検索条件を指定して検索します。
無条件で検索
検索条件はq
パラメータで指定します。
> curl -XGET "localhost:9200/tvfile/columbo/_search?q=*&from=0&size=10&pretty"
条件を指定して検索
> curl -XGET "localhost:9200/tvfile/columbo/_search?q=September%20Patrick&df=_all&default_operator=OR&from=0&size=10&_source=false&fields=title,original_air_date,runtime,guest_staring,directed_by,written_by,season,no_in_season&sort=season:asc,no_in_season:asc&track_scores=true&pretty"
Parameters
|name |default / description |
|:----------------------------|:--------------------------------------------------------------------------------------------|
|q
|The query string. |
|df
|The default field to use when no field prefix is defined within the query. |
|analyzer
|The analyzer name. |
|lowercase_expanded_terms
|Defaults totrue
. |
|analyze_wildcard
|Defaults tofalse
. |
|default_operator
|can beAND
orOR
. Defaults toOR
. |
|lenient
|Defaults tofalse
. |
|explain
|For each hit, contain an explanation of how scoring of the hits was computed. |
|_source
|Set tofalse
to disable retrieval of the_source
field. |
|fields
|The selective stored fields of the document to return for each hit, comma delimited. |
|sort
|Sorting to perform. Can either be in the form offieldName
, orfieldName:asc
/fieldName:desc
. |
|track_scores
|When sorting, set totrue
in order to still track scores and return them as part of each hit.|
|timeout
|Defaults to no timeout. |
|terminate_after
|The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.|
|from
|Defaults to0
. |
|size
|Defaults to10
. |
|search_type
|Defaults toquery_then_fetch
. |
-
q
に検索するキーワードを指定します。 -
df
に検索する対象のフィールド名を指定します。デフォルトは_all
です。 -
_source
にfalse
を指定すると検索結果に_sourceフィールドを含めません。 -
fields
に検索結果に含めたいフィールド名をカンマ区切りで指定します。 -
track_scores
にtrue
を指定するとソート時でもスコアを計算します。(デフォルトではソートを行うとスコアを計算しません。) -
from
、size
で検索するドキュメントの位置を指定できます。
[Elasticsearch Reference - URI Search] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-uri-request.html)
Request Body Search
Request Body Search
はリクエストボディに検索条件を指定します。
検索の種類にQueries
とFilters
があります。
これらの違いは
-
Queries
は全文検索および単語検索ができますが、Filters
は単語検索のみです。 -
Queries
はスコアを計算しますが、Filters
はスコアを計算しません。 -
Queries
はFilters
に比べてコストがかかります。 -
Queries
は検索結果をキャッシュしませんが、Filters
はキャッシュをします。
Queries
とFilters
を組み合わせて使用することもできます。
Queries
Match All Query
query
にmatchAll
を指定すると、無条件での検索になります。
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"matchAll": {}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"season": {"order": "asc"}
},
{
"no_in_season": {"order": "asc"}
}
]
}
-
_source
にfalse
を指定したので検索結果に_source
フィールドは含まれません。 -
from
に0,size
に100を指定したので先頭から100件まで取得します。(size
のデフォルトは10です。) -
fields
に検索結果に含めたいフィールド名を指定します。 -
sort
でドキュメントの並び順を指定します。
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_query.json
[Elasticsearch Reference - Match All Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-all-query.html)
Match Query
query
にmatch
を指定すると、指定したフィールド(この例ではoriginal_air_date
)に対してquery
で指定したキーワードを検索します。
{
"_source": false,
"from": 3,
"size": 3,
"query": {
"match": {
"original_air_date": {
"query": "September December",
"operator": "OR"
}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query.json
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.4385337,
"hits" : [ {
"_index" : "tvfile",
"_type" : "columbo",
"_id" : "AU5lJlhGw7_5S8xhhpHw",
"_score" : 0.9509891,
"fields" : {
"directed_by" : [ "Nicholas Colasanto" ],
"no_in_season" : [ 1 ],
"guest_staring" : [ "John Cassavetes" ],
"original_air_date" : [ "September 17, 1972" ],
"no_in_series" : [ 10 ],
"runtime" : [ 98 ],
"season" : [ 2 ],
"title" : [ "テ液ude in Black" ],
"written_by" : [ "Richard Levinson & William Link" ]
}
}, {
"_index" : "tvfile",
"_type" : "columbo",
"_id" : "AU5lJlhGw7_5S8xhhpH4",
"_score" : 0.9509891,
"fields" : {
"directed_by" : [ "Jeannot Szwarc" ],
"no_in_season" : [ 1 ],
"guest_staring" : [ "Vera Miles" ],
"original_air_date" : [ "September 23, 1973" ],
"no_in_series" : [ 18 ],
"runtime" : [ 73 ],
"season" : [ 3 ],
"title" : [ "Lovely But Lethal" ],
"written_by" : [ "Myrna Bercovici" ]
}
}, {
"_index" : "tvfile",
"_type" : "columbo",
"_id" : "AU5lJlhHw7_5S8xhhpIA",
"_score" : 0.9509891,
"fields" : {
"directed_by" : [ "Bernard L. Kowalski" ],
"no_in_season" : [ 1 ],
"guest_staring" : [ "Robert Conrad" ],
"original_air_date" : [ "September 15, 1974" ],
"no_in_series" : [ 26 ],
"runtime" : [ 98 ],
"season" : [ 4 ],
"title" : [ "An Exercise in Fatality" ],
"written_by" : [ "Larry Cohen" ]
}
} ]
}
}
[Elasticsearch Reference - Match Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-query.html)
Multi Match Query
query
にmulti_match
を指定すると、fields
で指定した複数のフィールドに対してquery
で指定したキーワードを検索します。
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"multi_match": {
"query": "October Patrick",
"type": "cross_fields",
"fields": ["original_air_date", "guest_staring"],
"operator": "AND"
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_multi_match_query.json
type
に指定できる値とその意味は下記の通りです。
Types of multi_match query
|type |description |
|:----------------------|:------------------------------------------------------|
|best_fields
|default. Finds documents which match any field, but uses the_score
from the best field.|
|most_fields
|Finds documents which match any field and combines the_score
from each field. |
|cross_fields
|Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. |
|phrase
|Runs a match_phrase query on each field and combines the_score
from each field. |
|phrase_prefix
|Runs a match_phrase_prefix query on each field and combines the_score
from each field.|
[Elasticsearch Reference - Multi Match Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-multi-match-query.html)
Query String Query
query
にquery_string
を指定すると、他のQueryより複雑な条件指定が可能になります。
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"query_string": {
"fields" : ["_all"],
"query": "(September OR Patrick) AND (season:5)"
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_string.json
default_field
クエリで検索するフィールドを明示的に指定しない場合に参照されるフィールドです。
デフォルトは_all
フィールドになります。
別のフィールドを指定する場合は
{
"settings": {
"index": {
"query": {
"default_field": "_all"
}
}
}
}
[Elasticsearch Reference - Query String Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-query-string-query.html)
Simple Query String Query
simple_query_string
はquery_string
の簡易版です。
{
"_source": false,
"from": 0,
"size": 100,
"query" : {
"simple_query_string" : {
"query": "(September | October | November) +(McGoohan)",
"fields": ["_all"]
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_simple_query_string.json
Available flags
|flag |description |
|:-----------|:-----------|
|ALL
| |
|NONE
| |
|AND
|+
|
|OR
||
|
|NOT
|-
|
|PREFIX
|*
|
|PHRASE
|"
|
|PRECEDENCE
|(
and)
|
|ESCAPE
| |
|WHITESPACE
| |
|FUZZY
|~N
after a word |
|NEAR
| |
|SLOP
|~N
after a phrase|
[Elasticsearch Reference - Simple Query String Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-simple-query-string-query.html)
Term Query
query
にterm
を指定すると、term
で指定するフィールドの値と完全に一致するドキュメントを検索します。
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"term": {
"japanese_air_date": "1973-02-25T00:00:00"
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_query.json
[Elasticsearch Reference - Term Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-term-query.html)
Bool Query
query
にbool
を指定すると、複数のqueryを組み合わせて検索することができます。
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"bool": {
"must": {
"match": {
"_all": {
"query": "September Patrick",
"operator": "OR"
}
}
},
"must": {
"term": {
"season": {"value": 5}
}
}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"no_in_series": {"order": "asc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_query.json
The occurrence types
|occur |description |
|:-------------|:-------------------------------------------------------------------|
|must
|The clause (query) must appear in matching documents. |
|should
|The clause (query) should appear in the matching document. |
|must_not
|The clause (query) must not appear in the matching documents. |
-
should
を指定をした場合、minimum_should_match
パラメータで最小のマッチ数を指定できます。
[Elasticsearch Reference - Bool Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-bool-query.html)
Range Query
query
にrange
を指定すると、range
で指定するフィールドの値で範囲検索することができます。
{
"_source": false,
"from": 0,
"size": 100,
"query" : {
"range" : {
"no_in_series":{"gte": 20, "lte": 24}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"no_in_series": {"order": "asc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_query.json
[Elasticsearch Reference - Range Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-range-query.html)
Ids Query
query
にids
を指定すると、_id
フィールドの値で検索することができます。
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"ids": {
"values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_query.json
[Elasticsearch Reference - Ids Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-ids-query.html)
Filters
Match All Filter
{
"_source": false,
"from": 0,
"size": 100,
"filter": {
"matchAll": {}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"season": {"order": "asc"}
},
{
"no_in_season": {"order": "asc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_filter.json
[Elasticsearch Reference - Match All Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-all-filter.html)
Query Filter
{
"_source": false,
"from": 0,
"size": 100,
"filter": {
"query": {
"query_string" : {
"fields" : ["_all"],
"query": "(September OR Patrick) AND (season:5)"
}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_filter.json
[Elasticsearch Reference - Query Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-query-filter.html)
Term Filter
{
"_source": false,
"from": 0,
"size": 100,
"filter": {
"term": {
"japanese_air_date": "1973-02-25T00:00:00"
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_filter.json
[Elasticsearch Reference - Term Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-term-filter.html)
Bool Filter
{
"_source": false,
"from": 0,
"size": 100,
"filter": {
"bool": {
"must": {
"term": {
"no_in_season": {"value": 1}
}
},
"must": {
"term": {
"season": {"value": 5}
}
}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_filter.json
[Elasticsearch Reference - Bool Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-bool-filter.html)
Range Filter
{
"_source": false,
"from": 0,
"size": 100,
"filter" : {
"range" : {
"no_in_series":{"gte": 20, "lte": 24}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"no_in_series": {"order": "asc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_filter.json
[Elasticsearch Reference - Range Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-range-filter.html)
Ids Filter
{
"_source": false,
"from": 0,
"size": 100,
"filter": {
"ids": {
"values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
]
}
-
_id
の値を自動生成する場合、ドキュメントを登録するたびに_id
値が変わりますので上記のjsonをそのまま使用して検索しても結果は得られません。
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_filter.json
[Elasticsearch Reference - Ids Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-ids-filter.html)
QueryとFilterの組み合わせ
{
"_source": false,
"from": 0,
"size": 100,
"query": {
"match": {
"_all": {
"query": "September Patrick",
"operator": "OR"
}
}
},
"filter" : {
"range" : {
"no_in_series":{"gte": 30, "lte": 39}
}
},
"fields": [
"title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
],
"track_scores": true,
"sort": [
{
"_score": {"order": "desc"}
}
]
}
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query_range_filter.json
クエリパラメータ
query parameters
|parameter |default / description |
|:---------------------|:--------------------------------------------------------|
|timeout
|Defaults to no timeout. |
|from
|Defaults to0
. |
|size
|Defaults to10
. |
|search_type
|Defaults toquery_then_fetch
. |
|query_cache
|Set totrue
orfalse
to enable or disable the caching of search results.|
|terminate_after
|Defaults to no terminate_after.[experimental]
|
-
timeout
はタイムアウトする時間を文字列で指定します。指定できる単位は下記のTime units
にある通りです。 -
search_type
とquery_cache
はクエリパラメータで指定します。
Time units
unit | description |
---|---|
y |
Year |
M |
Month |
w |
Week |
d |
Day |
h |
Hour |
m |
Minute |
s |
Second |
[Elasticsearch Reference - Request Body Search] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-request-body.html)
Elasticsearchの仕様メモ
mapping
Fields
ドキュメントのマッピングで使用できるフィールド
|field |default / description |
|:--------------|:------------------------------------------------------------------------------------------|
|_uid
|Each document indexed is associated with an id and a type, the internal_uid
field is the unique identifier of a document within an index and is composed of the type and the id.|
|_id
|By default it is not indexed and not stored (thus, not created). |
|_type
|By default, the _type field is indexed (but not analyzed) and not stored. |
|_source
|The_source
field is an automatically generated field that stores the actual JSON that was used as the indexed document. |
|_all
|The idea of the_all
field is that it includes the text of one or more other fields within the document indexed. |
|_analyzer
|Deprecated in 1.5.0. |
|_boost
|Deprecated in 1.0.0.RC1. |
|_parent
|The parent field mapping is defined on a child mapping, and points to the parent type this child relates to.|
|_field_names
|The _field_names field indexes the field names of a document, which can later be used to search for documents based on the fields that they contain typically using the exists and missing filters.|
|_routing
|The routing field allows to control the _routing aspect when indexing data and explicit routing control is required.|
|_index
|By default it is disabled.|
|_size
|By default it is disabled.|
|_timestamp
|By default it is disabled.|
|_ttl
|By default it is disabled.|
[Elasticsearch Reference - Fields] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-fields.html)
Types
ドキュメントのマッピングで使用できるデータタイプ
[Elasticsearch Reference - Types] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-types.html)
Core Types
string
attributes
|attribute |default / description |
|:----------------------------|:----------------------------------------------------------------------------------------|
|index_name
|Defaults to the property/field name. |
|store
|Defaults tofalse
. |
|index
|Defaults toanalyzed
.not_analyzed
,no
|
|doc_values
|Set to true to store field values in a column-stride fashion. |
|term_vector
|Defaults tono
. |
|boost
|Defaults to1.0
. |
|null_value
|Defaults to not adding the field at all. |
|norms: {enabled: <value>}
|Defaults to true for analyzed fields, and to false for not_analyzed fields. |
|norms: {loading: <value>}
|possible values are eager and lazy (default). |
|index_options
|Defaults to positions for analyzed fields, and to docs for not_analyzed fields. |
|analyzer
|Defaults to the globally configured analyzer. |
|index_analyzer
|The analyzer used to analyze the text contents when analyzed during indexing. |
|search_analyzer
|The analyzer used to analyze the field when part of a query string. |
|include_in_all
|If index is set tono
this defaults tofalse
, otherwise, defaults totrue
or to the parent object type setting. |
|ignore_above
|The analyzer will ignore strings larger than this size. |
|position_offset_gap
|Defaults to0
. |
copy_to
copy_to
を使用すると別のフィールドへ値をコピーすることができます。
{
"properties": {
"title": {
"type": "string",
"index": "analyzed",
"copy_to": "contents"
},
"contents": {
"type": "string"
}
}
}
fields
multi_field
タイプはversion 1.0で[Core Typesから削除] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/_multi_fields.html)されました。
fields
を使用することで1つのJSONソースフィールドを複数のフィールドへマップすることができます。
{
"properties": {
"title": {
"type": "string",
"index": "analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Number
数値型にはfloat
,double
,byte
,short
,integer
,long
があります。
attributes
|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|type
|float
,double
,integer
,long
,short
,byte
. Required.|
|index_name
|Defaults to the property/field name. |
|store
|Defaults tofalse
. |
|index
|Set tono
if the value should not be indexed. Setting to no disablesinclude_in_all
. |
|doc_values
|Set totrue
to store field values in a column-stride fashion. |
|precision_step
|Defaults to 16 forlong
,double
, 8 forshort
,integer
,float
, 2147483647 forbyte
.|
|boost
|Defaults to1.0
. |
|null_value
|Defaults to not adding the field at all. |
|include_in_all
|If index is set tono
this defaults tofalse
, otherwise, defaults totrue
or to the parent object type setting. |
|ignore_malformed
|Defaults tofalse
. |
|coerce
|Defaults totrue
. |
Date
attributes
|attribute |description |
|:----------------------------|:-----------------------------------------------|
|index_name
|Defaults to the property/field name. |
|format
|Defaults todateOptionalTime
. |
|store
|Defaults tofalse
. |
|index
|Set tono
if the value should not be indexed. Setting to no disablesinclude_in_all
. |
|doc_values
|Set totrue
to store field values in a column-stride fashion. |
|precision_step
|Defaults to16
. |
|boost
|Defaults to1.0
. |
|null_value
|Defaults to not adding the field at all. |
|include_in_all
|If index is set tono
this defaults tofalse
, otherwise, defaults totrue
or to the parent object type setting.|
|ignore_malformed
|Defaults tofalse
. |
|numeric_resolution
|Possible values include seconds and milliseconds (default). |
Boolean
attributes
|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|index_name
|Defaults to the property/field name. |
|store
|Defaults tofalse
. |
|index
|Set tono
if the value should not be indexed. Setting to no disablesinclude_in_all
. |
|boost
|Defaults to1.0
. |
|null_value
|Defaults to not adding the field at all. |
Binary
attributes
|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|index_name
|Defaults to the property/field name. |
|store
|Defaults tofalse
. |
|doc_values
|Set totrue
to store field values in a column-stride fashion. |
|compress
|Set totrue
to compress the stored binary value. |
|compress_threshold
|Defaults to-1
|
Root Object Type
Root Object Type
|type |default / description |
|:-----------------------|:-------------------------------------------------------------|
|dynamic_date_formats
|dynamic_date_formats
is the ability to set one or more date formats that will be used to detect date fields. |
|date_detection
|Allows to disable automatic date type detection. |
|numeric_detection
|Sometimes, even though json has support for native numeric types, numeric values are still provided as strings. |
|dynamic_templates
|Dynamic templates allow to define mapping templates that will be applied when dynamic introduction of fields / objects happens. |
[Elasticsearch Reference - Root Object Type] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-root-object-type.html)
Date Format
Built in Formatsの抜粋
|format |pattern |expected |
|:--------------------------------|:-----------------------------|:------------------------------|
|basic_date
|yyyyMMdd
|20060102 |
|basic_date_time
|yyyyMMdd'T'HHmmss.SSSZ
|20060102T150405.999+0900 |
|basic_date_time_no_millis
|yyyyMMdd'T'HHmmssZ
|20060102T150405+0900 |
| | | |
|date
|yyyy-MM-dd
|2006-01-02 |
|date_time
|yyyy-MM-dd'T'HH:mm:ss.SSSZZ
|2006-01-02T15:04:05.999+09:00 |
|date_time_no_mills
|yyyy-MM-dd'T'HH:mm:ssZZ
|2006-01-02T15:04:05+09:00 |
|date_optional_time
|yyyy-MM-dd
|2006-01-02 |
|date_optional_time
|yyyy-MM-dd'T'HH:mm:ss
|2006-01-02T15:04:05 |
| | | |
|date_hour_minute_second
|yyyy-MM-dd'T'HH:mm:ss
|2006-01-02T15:04:05 |
|date_hour_minute_second_millis
|yyyy-MM-dd'T'HH:mm:ss.SSS
|2006-01-02T15:04:05.999 |
[Elasticsearch Reference - mapping-data-formt] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-date-format.html)
Analysis
Analyzerは1個のTokenizerと0個以上のToken Filterの組み合わせです。
{
"settings": {
"analysis": {
"analyzer": {
"{analyzer論理名}": {
"type": "使用するanalyzerの指定",
"使用するanalyzer固有の設定"
},
"kuromoji_analyzer": {
"type": "custom",
"tokenizer": "kuromoji",
"filter": [
"kuromoji_baseform",
"kuromoji_pos_filter"
]
},
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer",
"filter": [
"my_filter"
],
"char_filter": [
"my_char_filter"
]
}
},
"tokenizer": {
"{tokenizer論理名}": {
"type": "使用するtokenizerの指定",
"使用するtokenizer固有の設定"
},
"kuromoji": {
"type": "kuromoji_tokenizer"
},
"my_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
},
"filter": {
"{filter論理名}": {
"type": "使用するfilterの指定",
"使用するfilter固有の設定"
},
"kuromoji_pos_filter": {
"type": "kuromoji_part_of_speech"
},
"my_filter": {
"type": "stop",
"stopwords": ["NGWORD_A", "NGWORD_B", "NGWORD_C"]
}
},
"char_filter": {
"{char_filter論理名}": {
"type": "使用するchar_filterの指定",
"使用するchar_filter固有の設定"
},
"my_char_filter": {
"type": "mapping",
"mappings" : ["kb=>kilobyte","mb=>megabyte","gb=>gigabyte"]
}
}
},
"index": {
"indexの設定"
}
},
"mappings": {
"{type名}": {
"typeの設定"
},
"{type名}": {
"typeの設定"
}
}
}
Analyzers
Built in Analyzers
The following types are supported
type | language |
---|---|
arabic |
アラビア語 |
armenian |
アルメニア語 |
basque |
バスク語 |
brazilian |
ポルトガル語(ブラジル) |
bulgarian |
ブルガリア語 |
catalan |
カタロニア語 |
chinese |
中国語 |
cjk |
CJK統合漢字 |
czech |
チェコ語 |
danish |
デンマーク語 |
dutch |
オランダ語 |
english |
英語 |
finnish |
フィンランド語 |
french |
フランス語 |
galician |
ガリシア語 |
german |
ドイツ語 |
greek |
ギリシャ語 |
hindi |
ヒンディー語 |
hungarian |
ハンガリー語 |
indonesian |
インドネシア語 |
irish |
アイルランド語 |
italian |
イタリア語 |
latvian |
ラトビア語 |
norwegian |
ノルウェー語 |
persian |
ペルシャ語 |
portuguese |
ポルトガル語 |
romanian |
ルーマニア語 |
russian |
ロシア語 |
sorani |
クルド語(?) |
spanish |
スペイン語 |
swedish |
スウェーデン語 |
turkish |
トルコ語 |
thai |
タイ語 |
Custom Analyzerの設定サンプル
kuromojiの設定を例にしたCustom Analyzerの設定サンプルです。
{
"settings": {
"analysis": {
"tokenizer": {
"kuromoji": {
"type": "kuromoji_tokenizer"
}
},
"filter": {
"greek_lowercase_filter": {
"type": "lowercase",
"language": "greek"
},
"kuromoji_pos_filter": {
"type": "kuromoji_part_of_speech"
}
},
"analyzer": {
"kuromoji_analyzer": {
"type": "custom",
"tokenizer": "kuromoji",
"filter": [
"kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
]
}
}
}
}
}
-
kuromoji_tokenizer
は、kuromojiのbuilt in tokenizerです。 -
kuromoji_baseform
、kuromoji_part_of_speech
は、kuromojiのbuilt in token filterです。
Setting | Description |
---|---|
tokenizer |
使用するtokenizerの名前. |
filter |
オプション. 使用するtoken filterの名前のリスト. |
char_filter |
オプション. 使用するchar filterの名前のリスト. |
position_offset_gap |
An optional number of positions to increment between each field value of a field using this analyzer. |
[Elasticsearch Reference - Analyzers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-analyzers.html)
Tokenizers
Built in Tokenizers
edgeNGramとnGram
elastic
というテキストをnGram
とedgeNGram
でトークンにした場合の結果
- min_gram: 2
- max_gram: 3
- token_chars:
letter
,digit
nGram
position | type | token |
---|---|---|
1 | word | el |
2 | word | ela |
3 | word | la |
4 | word | las |
5 | word | as |
6 | word | ast |
7 | word | st |
8 | word | sti |
9 | word | ti |
10 | word | tic |
11 | word | ic |
edgeNGram
position | type | token |
---|---|---|
1 | word | el |
2 | word | ela |
uax_url_email
Elasticsearch reference <a href =\"https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html\">UAX Email URL Tokenizer</a>
というテキストをトークンにした場合の結果
position | type | token |
---|---|---|
1 | ALPHANUM | Elasticsearch |
2 | ALPHANUM | reference |
3 | ALPHANUM | a |
4 | ALPHANUM | href |
5 | URL | https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html |
6 | ALPHANUM | UAX |
7 | ALPHANUM | Email |
8 | ALPHANUM | URL |
9 | ALPHANUM | Tokenizer |
10 | ALPHANUM | a |
path_hierarchy
C:/Windows/System32/drivers/etc
というテキストをトークンにした場合の結果
position | type | token |
---|---|---|
1 | word | C: |
1 | word | C:/Windows |
1 | word | C:/Windows/System32 |
1 | word | C:/Windows/System32/drivers |
1 | word | C:/Windows/System32/drivers/etc |
[Elasticsearch Reference - Tokenizers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-tokenizers.html)
Token Filters
Normalization Token Filter
language | type |
---|---|
Arabic | arabic_normalization |
German | german_normalization |
Hindi | hindi_normalization |
Indic | indic_normalization |
Kurdish (Sorani) | sorani_normalization |
Persian | persian_normalization |
Scandinavian |
scandinavian_normalization , scandinavian_folding
|
[Elasticsearch Reference - Token Filters] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-tokenfilters.html)
Character Filters
Character Filter | type |
---|---|
[Mapping Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-mapping-charfilter.html) | mapping |
[Mapping Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-mapping-charfilter.html) | html_strip |
[Pattern Replace Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-replace-charfilter.html) | pattern_replace |
[Elasticsearch Reference - Character Filters] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-charfilters.html)
Analyze
Indices APIsの_analyze
を使用するとanalyzerの結果を確認することができます。Elasticsearchのbuilt in analyzerであればインデックスを指定する必要がありません。
> curl -XGET "[host name][:port]/[index name]/_analyze?analyzer={analyzer name}&tokenizer={tokenizer name}&token_filters={}&char_filters={}" -d "The Bye-Bye Sky High IQ Murder Case"
standard
> curl -XGET "localhost:9200/_analyze?analyzer=standard" -d "The Bye-Bye Sky High IQ Murder Case"
analyzerの挙動を詳しく確認したい場合は、indexに設定した方がよいようです。
下記はtest
というindexに確認したanalyzerを設定し、そのanalyzerを使用する例です。
curl -XPUT "localhost:9200/test?pretty" -d "{
\"settings\": {
\"analysis\": {
\"analyzer\": {
"my_analyzer": {
\"type\": \"custom\",
\"tokenizer\": \"my_tokenizer\"
}
},
\"tokenizer\": {
\"my_tokenizer\": {
\"type\": \"path_hierarchy\",
\"reverse\": false,
\"skip\": 0
}
}
}
}
}"
curl -XGET "localhost:9200/test/_analyze?analyzer=my_analyzer&pretty" -d "C:/Windows/System32/drivers/etc"
[Elasticsearch Reference - Analyze] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-analyze.html)