16
16

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Elasticsearch 1.6.0 - Installing on Windows 7

Posted at

概要

Windows7に開発・検証目的用にElasticsearch 1.6.0をインストールし初歩的な設定を行います。
その次にサンプルデータを登録して初歩的な検索方法の確認を行います。

環境

この記事の内容は下記のバージョンで動作確認を行いました。

Windowsにはcurlコマンドがありませんので[cURL] (http://curl.haxx.se/)を使用しました。

参考

下記のサイトを参考にさせて頂きました。

Elasticsearch

Slide

Blog

Qiita

インストール

Windows7にElasticsearchといくつかpluginをインストールします。

Elasticsearch

ダウンロードページよりアーカイブファイルをダウンロードし適当な場所に展開します。
ダウンロードしたアーカイブファイルはelasticsearch-1.6.0.zipです。

展開

インストールはアーカイブファイルを適当な場所へ展開するだけで済みます。
D:\dev\elasticsearch-1.6.0へ展開しました。

設定

開発・検証用なので最小リソースで起動するように設定します。
設定ファイルは展開したディレクトリのconf/elasticsearch.ymlです。

下記に変更点のみ抜粋します。

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: master

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true
  • node.name : node名を指定しない場合はElasticsearchインスタンスの起動時に自動的に命名されます。
# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
index.number_of_shards: 1
index.number_of_replicas: 0
  • 開発用なのでシャード数は1で、レプリカは行わないようにします。
# Set this property to true to lock the memory:
#
bootstrap.mlockall: true
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]

起動

展開したディレクトリへ移動して下記のコマンドを実行します。
オプションで使用するメモリサイズを指定することができます。

> bin/elasticsearch.bat -Xmx256m -Xms256m

動作確認

curlかブラウザで下記のURLへアクセスしstatus 200のレスポンスが返ってくることを確認します。

GET
> curl -XGET "localhost:9200/"
response
{
  "status" : 200,
  "name" : "master",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.6.0",
    "build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
    "build_timestamp" : "2015-06-09T13:36:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

plugin

elasticsearch-head

URL: http://mobz.github.io/elasticsearch-head/

インストール

install
> bin/plugin -install mobz/elasticsearch-head

確認

下記のURLにアクセスしてへheadページが表示されることを確認します。

elasticsearch-analysis-kuromoji

URL: https://github.com/elastic/elasticsearch-analysis-kuromoji

インストール

install
> bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.6.0

elasticsearch-inquisitor

URL: https://github.com/polyfractal/elasticsearch-inquisitor

インストール

install
> bin/plugin -install polyfractal/elasticsearch-inquisitor

確認

下記のURLにアクセスしてInquisitorページが表示されることを確認します。

インストールしたプラグインの確認

> bin\plugin -l
response
Installed plugins:
    - analysis-kuromoji
    - head
    - inquisitor

基本的な検索方法の確認

サンプルデータをインデックスし何通りかの方法で検索をします。

サンプルデータの準備

サンプルデータにはテレビドラマの情報を使用します。

field data type description
title 文字列 原題
original_air_date 文字列 放送日
runtime 整数 放送時間(分)
guest_staring 文字列 ゲスト出演
guest_staring_role 文字列 ゲスト役柄
directed_by 文字列 監督
written_by 文字配列 脚本
teleplay 文字配列 テレビ脚本
season 整数 シーズン
no_in_season 整数 シーズン回
no_in_series 整数 放送回
japanese_title 文字列 邦題
japanese_air_date 日付 日本放送日

mapping

index: tvfile
type: columbo

サンプルデータのマッピング

columbo_mapping.json
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "analysis": {
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "standard"
          ]
        },
        "letter_lower_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "lowercase"
          ]
        },
        "letter_upper_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "uppercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "columbo": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": true
      },
      "_timestamp": {
        "enabled": true
      },
      "dynamic": "strict",
      "properties": {
        "title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "original_air_date": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "runtime": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "guest_staring": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "guest_staring_role": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "directed_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "written_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "teleplay": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_series": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "japanese_title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "kuromoji_analyzer",
          "store": true,
          "include_in_all": true
        },
        "japanese_air_date": {
          "type": "date",
          "format": "dateHourMinuteSecond",
          "store": true,
          "include_in_all": false
        }
      }
    }
  }
}

indexの作成

上記のjsonファイルを使用してindexを作成しmappingを設定します。

POST
> curl -XPOST "localhost:9200/tvfile?pretty" -d @columbo_mapping.json

mappingの確認

GET
> curl -XGET "localhost:9200/tvfile/_settings,_mappings?pretty"

indexを削除する場合

DELETE
> curl -XDELETE "localhost:9200/tvfile"

サンプルデータ

長くなるので一部分のみ掲載します。サンプルデータ全文は[こちらのページ] (http://qiita.com/rubytomato@github/private/700be487ddb7221c29cc)にあります。

columbo_data.json
{"index": {}}
{"title": "Prescription: Murder",                "original_air_date": "February 20, 1968",   "runtime": 98, "guest_staring": "Gene Barry",        "guest_staring_role": "Dr. Ray Fleming (Gene Barry), a psychiatrist",                                              "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": [""],                                                                     "season": 0, "no_in_season": 1, "no_in_series": 1,  "japanese_title": "殺人処方箋",                   "japanese_air_date": "1972-08-27T00:00:00"}
{"index": {}}
{"title": "Ransom for a Dead Man",               "original_air_date": "March 1, 1971",       "runtime": 98, "guest_staring": "Lee Grant",         "guest_staring_role": "Leslie Williams, a brilliant lawyer and pilot",                                             "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": ["Dean Hargrove"],                                                        "season": 0, "no_in_season": 2, "no_in_series": 2,  "japanese_title": "死者の身代金",                 "japanese_air_date": "1973-04-22T00:00:00"}
{"index": {}}
{"title": "Murder by the Book",                  "original_air_date": "September 15, 1971",  "runtime": 73, "guest_staring": "Jack Cassidy",      "guest_staring_role": "Ken Franklin is one half of a mystery writing team",                                        "directed_by": "Steven Spielberg",     "written_by": ["Steven Bochco"],                                    "teleplay": [""],                                                                     "season": 1, "no_in_season": 1, "no_in_series": 3,  "japanese_title": "構想の死角",                   "japanese_air_date": "1972-11-26T00:00:00"}

ドキュメントの登録

上記のjsonファイルを使用してドキュメントをインデックスします。

POST
> curl -XPOST "localhost:9200/tvfile/columbo/_bulk?pretty" --data-binary @columbo_data.json

ドキュメントの全削除する場合

DELETE
> curl -XDELETE "localhost:9200/tvfile/columbo?pretty"

ドキュメント件数をカウント

GET
> curl -XGET "localhost:9200/tvfile/columbo/_count?pretty" -d "{
  \"query\": {
      \"matchAll\": {}
  }
}"
response
{
  "count" : 45,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

ドキュメントの検索

検索はSearch APIsを使用します。検索方法はURI SearchRequest Body Searchがあります。

Syntax
[host name][:port]/[index name]/[type name]/_search

検索結果に含まれる共通フィールド

field description
took 検索にかかった時間(ミリ秒).
timed_out 検索がタイムアウトしたかどうかを真偽値で表現.
_shards 検索できたシャード数および検索できなかったシャード数.
hits 検索結果を保持
hits.total 検索条件に一致するドキュメントの件数.
hits.hits 検索結果に一致するドキュメントの配列.(デフォルトは10件)
_score ドキュメントのスコア.
max_score 最大スコア.

[The Search API] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/_the_search_api.html)

URI Search

URL Searchはリクエストパラメータに検索条件を指定して検索します。

無条件で検索

検索条件はqパラメータで指定します。

GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?q=*&from=0&size=10&pretty"
条件を指定して検索
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?q=September%20Patrick&df=_all&default_operator=OR&from=0&size=10&_source=false&fields=title,original_air_date,runtime,guest_staring,directed_by,written_by,season,no_in_season&sort=season:asc,no_in_season:asc&track_scores=true&pretty"

Parameters

|name |default / description |
|:----------------------------|:--------------------------------------------------------------------------------------------|
|q |The query string. |
|df |The default field to use when no field prefix is defined within the query. |
|analyzer |The analyzer name. |
|lowercase_expanded_terms |Defaults to true. |
|analyze_wildcard |Defaults to false. |
|default_operator |can be AND or OR. Defaults to OR. |
|lenient |Defaults to false. |
|explain |For each hit, contain an explanation of how scoring of the hits was computed. |
|_source |Set to false to disable retrieval of the _source field. |
|fields |The selective stored fields of the document to return for each hit, comma delimited. |
|sort |Sorting to perform. Can either be in the form of fieldName, or fieldName:asc / fieldName:desc. |
|track_scores |When sorting, set to true in order to still track scores and return them as part of each hit.|
|timeout |Defaults to no timeout. |
|terminate_after |The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.|
|from |Defaults to 0. |
|size |Defaults to 10. |
|search_type |Defaults to query_then_fetch. |

  • qに検索するキーワードを指定します。
  • dfに検索する対象のフィールド名を指定します。デフォルトは_allです。
  • _sourcefalseを指定すると検索結果に_sourceフィールドを含めません。
  • fieldsに検索結果に含めたいフィールド名をカンマ区切りで指定します。
  • track_scorestrueを指定するとソート時でもスコアを計算します。(デフォルトではソートを行うとスコアを計算しません。)
  • fromsizeで検索するドキュメントの位置を指定できます。

[Elasticsearch Reference - URI Search] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-uri-request.html)

Request Body Search

Request Body Searchはリクエストボディに検索条件を指定します。
検索の種類にQueriesFiltersがあります。

これらの違いは

  • Queriesは全文検索および単語検索ができますが、Filtersは単語検索のみです。
  • Queriesはスコアを計算しますが、Filtersはスコアを計算しません。
  • QueriesFiltersに比べてコストがかかります。
  • Queriesは検索結果をキャッシュしませんが、Filtersはキャッシュをします。

QueriesFiltersを組み合わせて使用することもできます。

Queries
Match All Query

querymatchAllを指定すると、無条件での検索になります。

columbo_match_all_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}
  • _sourcefalseを指定したので検索結果に_sourceフィールドは含まれません。
  • fromに0,sizeに100を指定したので先頭から100件まで取得します。(sizeのデフォルトは10です。)
  • fieldsに検索結果に含めたいフィールド名を指定します。
  • sortでドキュメントの並び順を指定します。
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_query.json

[Elasticsearch Reference - Match All Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-all-query.html)

Match Query

querymatchを指定すると、指定したフィールド(この例ではoriginal_air_date)に対してqueryで指定したキーワードを検索します。

columbo_match_query.json
{
  "_source": false,
  "from": 3,
  "size": 3,
  "query": {
    "match": {
      "original_air_date": {
        "query": "September December",
        "operator": "OR"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query.json
response
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : 1.4385337,
    "hits" : [ {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpHw",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Nicholas Colasanto" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "John Cassavetes" ],
        "original_air_date" : [ "September 17, 1972" ],
        "no_in_series" : [ 10 ],
        "runtime" : [ 98 ],
        "season" : [ 2 ],
        "title" : [ "テ液ude in Black" ],
        "written_by" : [ "Richard Levinson & William Link" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpH4",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Jeannot Szwarc" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Vera Miles" ],
        "original_air_date" : [ "September 23, 1973" ],
        "no_in_series" : [ 18 ],
        "runtime" : [ 73 ],
        "season" : [ 3 ],
        "title" : [ "Lovely But Lethal" ],
        "written_by" : [ "Myrna Bercovici" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhHw7_5S8xhhpIA",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Bernard L. Kowalski" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Robert Conrad" ],
        "original_air_date" : [ "September 15, 1974" ],
        "no_in_series" : [ 26 ],
        "runtime" : [ 98 ],
        "season" : [ 4 ],
        "title" : [ "An Exercise in Fatality" ],
        "written_by" : [ "Larry Cohen" ]
      }
    } ]
  }
}

[Elasticsearch Reference - Match Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-query.html)

Multi Match Query

querymulti_matchを指定すると、fieldsで指定した複数のフィールドに対してqueryで指定したキーワードを検索します。

columbo_multi_match_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "multi_match": {
      "query": "October Patrick",
      "type": "cross_fields",
      "fields": ["original_air_date", "guest_staring"],
      "operator": "AND"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_multi_match_query.json

typeに指定できる値とその意味は下記の通りです。

Types of multi_match query

|type |description |
|:----------------------|:------------------------------------------------------|
|best_fields |default. Finds documents which match any field, but uses the _score from the best field.|
|most_fields |Finds documents which match any field and combines the _score from each field. |
|cross_fields |Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. |
|phrase |Runs a match_phrase query on each field and combines the _score from each field. |
|phrase_prefix |Runs a match_phrase_prefix query on each field and combines the _score from each field.|

[Elasticsearch Reference - Multi Match Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-multi-match-query.html)

Query String Query

queryquery_stringを指定すると、他のQueryより複雑な条件指定が可能になります。

columbo_query_string.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "query_string": {
      "fields" : ["_all"],
      "query": "(September OR Patrick) AND (season:5)"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_string.json

default_field

クエリで検索するフィールドを明示的に指定しない場合に参照されるフィールドです。
デフォルトは_allフィールドになります。

別のフィールドを指定する場合は

example
{
  "settings": {
    "index": {
      "query": {
        "default_field": "_all"
      }
    }
  }
}

[Elasticsearch Reference - Query String Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-query-string-query.html)

Simple Query String Query

simple_query_stringquery_stringの簡易版です。

columbo_simple_query_string.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "simple_query_string" : {
      "query": "(September | October | November) +(McGoohan)",
      "fields": ["_all"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_simple_query_string.json

Available flags

|flag |description |
|:-----------|:-----------|
|ALL | |
|NONE | |
|AND |+ |
|OR || |
|NOT |- |
|PREFIX |* |
|PHRASE |" |
|PRECEDENCE|( and ) |
|ESCAPE | |
|WHITESPACE| |
|FUZZY |~N after a word |
|NEAR | |
|SLOP |~N after a phrase|

[Elasticsearch Reference - Simple Query String Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-simple-query-string-query.html)

Term Query

querytermを指定すると、termで指定するフィールドの値と完全に一致するドキュメントを検索します。

columbo_term_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_query.json

[Elasticsearch Reference - Term Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-term-query.html)

Bool Query

queryboolを指定すると、複数のqueryを組み合わせて検索することができます。

columbo_bool_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "bool": {
      "must": {
        "match": {
          "_all": {
            "query": "September Patrick",
            "operator": "OR"
          }
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_query.json

The occurrence types

|occur |description |
|:-------------|:-------------------------------------------------------------------|
|must |The clause (query) must appear in matching documents. |
|should |The clause (query) should appear in the matching document. |
|must_not |The clause (query) must not appear in the matching documents. |

  • shouldを指定をした場合、minimum_should_matchパラメータで最小のマッチ数を指定できます。

[Elasticsearch Reference - Bool Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-bool-query.html)

Range Query

queryrangeを指定すると、rangeで指定するフィールドの値で範囲検索することができます。

columbo_range_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_query.json

[Elasticsearch Reference - Range Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-range-query.html)

Ids Query

queryidsを指定すると、_idフィールドの値で検索することができます。

columbo_ids_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_query.json

[Elasticsearch Reference - Ids Query] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-ids-query.html)

Filters
Match All Filter
columbo_match_all_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_filter.json

[Elasticsearch Reference - Match All Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-match-all-filter.html)

Query Filter
columbo_query_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "query": {
      "query_string" : {
        "fields" : ["_all"],
        "query": "(September OR Patrick) AND (season:5)"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_filter.json

[Elasticsearch Reference - Query Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-query-filter.html)

Term Filter
columbo_term_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_filter.json

[Elasticsearch Reference - Term Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-term-filter.html)

Bool Filter
columbo_bool_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "bool": {
      "must": {
        "term": {
          "no_in_season": {"value": 1}
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_filter.json

[Elasticsearch Reference - Bool Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-bool-filter.html)

Range Filter
columbo_range_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_filter.json

[Elasticsearch Reference - Range Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-range-filter.html)

Ids Filter
columbo_ids_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ]
}
  • _idの値を自動生成する場合、ドキュメントを登録するたびに_id値が変わりますので上記のjsonをそのまま使用して検索しても結果は得られません。
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_filter.json

[Elasticsearch Reference - Ids Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/query-dsl-ids-filter.html)

QueryとFilterの組み合わせ
columbo_match_query_range_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "match": {
      "_all": {
        "query": "September Patrick",
        "operator": "OR"
      }
    }
  },
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 30, "lte": 39}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query_range_filter.json

クエリパラメータ

query parameters

|parameter |default / description |
|:---------------------|:--------------------------------------------------------|
|timeout |Defaults to no timeout. |
|from |Defaults to 0. |
|size |Defaults to 10. |
|search_type |Defaults to query_then_fetch. |
|query_cache |Set to true or false to enable or disable the caching of search results.|
|terminate_after |Defaults to no terminate_after. [experimental] |

  • timeoutはタイムアウトする時間を文字列で指定します。指定できる単位は下記のTime unitsにある通りです。
  • search_typequery_cacheはクエリパラメータで指定します。

Time units

unit description
y Year
M Month
w Week
d Day
h Hour
m Minute
s Second

[Elasticsearch Reference - Request Body Search] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/search-request-body.html)

Elasticsearchの仕様メモ

mapping

Fields

ドキュメントのマッピングで使用できるフィールド

|field |default / description |
|:--------------|:------------------------------------------------------------------------------------------|
|_uid |Each document indexed is associated with an id and a type, the internal _uid field is the unique identifier of a document within an index and is composed of the type and the id.|
|_id |By default it is not indexed and not stored (thus, not created). |
|_type |By default, the _type field is indexed (but not analyzed) and not stored. |
|_source |The _source field is an automatically generated field that stores the actual JSON that was used as the indexed document. |
|_all |The idea of the _all field is that it includes the text of one or more other fields within the document indexed. |
|_analyzer |Deprecated in 1.5.0. |
|_boost |Deprecated in 1.0.0.RC1. |
|_parent |The parent field mapping is defined on a child mapping, and points to the parent type this child relates to.|
|_field_names |The _field_names field indexes the field names of a document, which can later be used to search for documents based on the fields that they contain typically using the exists and missing filters.|
|_routing |The routing field allows to control the _routing aspect when indexing data and explicit routing control is required.|
|_index |By default it is disabled.|
|_size |By default it is disabled.|
|_timestamp |By default it is disabled.|
|_ttl |By default it is disabled.|

[Elasticsearch Reference - Fields] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-fields.html)

Types

ドキュメントのマッピングで使用できるデータタイプ

[Elasticsearch Reference - Types] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-types.html)

Core Types
string

attributes

|attribute |default / description |
|:----------------------------|:----------------------------------------------------------------------------------------|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|index |Defaults to analyzed. not_analyzed, no |
|doc_values |Set to true to store field values in a column-stride fashion. |
|term_vector |Defaults to no. |
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |
|norms: {enabled: <value>} |Defaults to true for analyzed fields, and to false for not_analyzed fields. |
|norms: {loading: <value>} |possible values are eager and lazy (default). |
|index_options |Defaults to positions for analyzed fields, and to docs for not_analyzed fields. |
|analyzer |Defaults to the globally configured analyzer. |
|index_analyzer |The analyzer used to analyze the text contents when analyzed during indexing. |
|search_analyzer |The analyzer used to analyze the field when part of a query string. |
|include_in_all |If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting. |
|ignore_above |The analyzer will ignore strings larger than this size. |
|position_offset_gap |Defaults to 0. |

copy_to

copy_toを使用すると別のフィールドへ値をコピーすることができます。

example
{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "copy_to": "contents"
    },
    "contents": {
      "type": "string"
    }
  }
}

fields

multi_fieldタイプはversion 1.0で[Core Typesから削除] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/_multi_fields.html)されました。
fieldsを使用することで1つのJSONソースフィールドを複数のフィールドへマップすることができます。

example
{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}
Number

数値型にはfloat,double,byte,short,integer,longがあります。

attributes

|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|type |float, double, integer, long, short, byte. Required.|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|index |Set to no if the value should not be indexed. Setting to no disables include_in_all. |
|doc_values |Set to true to store field values in a column-stride fashion. |
|precision_step |Defaults to 16 for long, double, 8 for short, integer, float, 2147483647 for byte.|
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |
|include_in_all |If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting. |
|ignore_malformed |Defaults to false. |
|coerce |Defaults to true. |

Date

attributes

|attribute |description |
|:----------------------------|:-----------------------------------------------|
|index_name |Defaults to the property/field name. |
|format |Defaults to dateOptionalTime. |
|store |Defaults to false. |
|index |Set to no if the value should not be indexed. Setting to no disables include_in_all. |
|doc_values |Set to true to store field values in a column-stride fashion. |
|precision_step |Defaults to 16. |
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |
|include_in_all |If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.|
|ignore_malformed |Defaults to false. |
|numeric_resolution |Possible values include seconds and milliseconds (default). |

Boolean

attributes

|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|index |Set to no if the value should not be indexed. Setting to no disables include_in_all. |
|boost |Defaults to 1.0. |
|null_value |Defaults to not adding the field at all. |

Binary

attributes

|attribute |default / description |
|:----------------------------|:-----------------------------------------------|
|index_name |Defaults to the property/field name. |
|store |Defaults to false. |
|doc_values |Set to true to store field values in a column-stride fashion. |
|compress |Set to true to compress the stored binary value. |
|compress_threshold |Defaults to -1 |

Root Object Type

Root Object Type

|type |default / description |
|:-----------------------|:-------------------------------------------------------------|
|dynamic_date_formats |dynamic_date_formats is the ability to set one or more date formats that will be used to detect date fields. |
|date_detection |Allows to disable automatic date type detection. |
|numeric_detection |Sometimes, even though json has support for native numeric types, numeric values are still provided as strings. |
|dynamic_templates |Dynamic templates allow to define mapping templates that will be applied when dynamic introduction of fields / objects happens. |

[Elasticsearch Reference - Root Object Type] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-root-object-type.html)

Date Format

Built in Formatsの抜粋

|format |pattern |expected |
|:--------------------------------|:-----------------------------|:------------------------------|
|basic_date |yyyyMMdd |20060102 |
|basic_date_time |yyyyMMdd'T'HHmmss.SSSZ |20060102T150405.999+0900 |
|basic_date_time_no_millis |yyyyMMdd'T'HHmmssZ |20060102T150405+0900 |
| | | |
|date |yyyy-MM-dd |2006-01-02 |
|date_time |yyyy-MM-dd'T'HH:mm:ss.SSSZZ |2006-01-02T15:04:05.999+09:00 |
|date_time_no_mills |yyyy-MM-dd'T'HH:mm:ssZZ |2006-01-02T15:04:05+09:00 |
|date_optional_time |yyyy-MM-dd |2006-01-02 |
|date_optional_time |yyyy-MM-dd'T'HH:mm:ss |2006-01-02T15:04:05 |
| | | |
|date_hour_minute_second |yyyy-MM-dd'T'HH:mm:ss |2006-01-02T15:04:05 |
|date_hour_minute_second_millis |yyyy-MM-dd'T'HH:mm:ss.SSS |2006-01-02T15:04:05.999 |

[Elasticsearch Reference - mapping-data-formt] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/mapping-date-format.html)

Analysis

Analyzerは1個のTokenizerと0個以上のToken Filterの組み合わせです。

example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "{analyzer論理名}": {
          "type": "使用するanalyzerの指定",
          "使用するanalyzer固有の設定"
        },
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform",
            "kuromoji_pos_filter"
          ]
        },
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "my_filter"
          ],
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "tokenizer": {
        "{tokenizer論理名}": {
          "type": "使用するtokenizerの指定",
          "使用するtokenizer固有の設定"
        },
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "my_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "filter": {
        "{filter論理名}": {
          "type": "使用するfilterの指定",
          "使用するfilter固有の設定"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        },
        "my_filter": {
          "type": "stop",
          "stopwords": ["NGWORD_A", "NGWORD_B", "NGWORD_C"]
        }
      },
      "char_filter": {
        "{char_filter論理名}": {
          "type": "使用するchar_filterの指定",
          "使用するchar_filter固有の設定"
        },
        "my_char_filter": {
          "type": "mapping",
          "mappings" : ["kb=>kilobyte","mb=>megabyte","gb=>gigabyte"]
        }
      }
    },
    "index": {
      "indexの設定"
    }
  },
  "mappings": {
    "{type名}": {
      "typeの設定"
    },
    "{type名}": {
      "typeの設定"
    }
  }
}

Analyzers

Built in Analyzers

Analyzers type description
[Standard Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-standard-analyzer.html) standard Standard Tokenizer,the Standard Token Filter,Lower Case Token Filter,Stop Token Filterから構成されるanalyzer.
[Simple Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-simple-analyzer.html) simple Lower Case Tokenizerから構成されるanalyzer.
[Whitespace Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-whitespace-analyzer.html) whitespace Whitespace Tokenizerから構成されるanalyzer.
[Stop Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stop-analyzer.html) stop Lower Case TokenizerStop Token Filterから構成されるanalyzer.
[Keyword Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-analyzer.html) keyword 与えられた文字全体を1つのトークンとして処理するanalyzer.
[Pattern Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-analyzer.html) pattern 正規表現を使用するanalyzer.
[Language Analyzers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-lang-analyzer.html) 下表参照 特定言語用のanalyzer.
[Snowball Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-snowball-analyzer.html) snowball standard tokenizer, standard filter, lowercase filter, stop filter, snowball filterから構成されるanalyzer.
[Custom Analyzer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-custom-analyzer.html) custom 任意のTokenizer, 0個以上の任意のToken Filters, 0個以上の任意のChar Filtersを組み合わせて構成するanalyzer.

The following types are supported

type language
arabic アラビア語
armenian アルメニア語
basque バスク語
brazilian ポルトガル語(ブラジル)
bulgarian ブルガリア語
catalan カタロニア語
chinese 中国語
cjk CJK統合漢字
czech チェコ語
danish デンマーク語
dutch オランダ語
english 英語
finnish フィンランド語
french フランス語
galician ガリシア語
german ドイツ語
greek ギリシャ語
hindi ヒンディー語
hungarian ハンガリー語
indonesian インドネシア語
irish アイルランド語
italian イタリア語
latvian ラトビア語
norwegian ノルウェー語
persian ペルシャ語
portuguese ポルトガル語
romanian ルーマニア語
russian ロシア語
sorani クルド語(?)
spanish スペイン語
swedish スウェーデン語
turkish トルコ語
thai タイ語
Custom Analyzerの設定サンプル

kuromojiの設定を例にしたCustom Analyzerの設定サンプルです。

example
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        }
      },
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        }
      }
    }
  }
}
  • kuromoji_tokenizerは、kuromojiのbuilt in tokenizerです。
  • kuromoji_baseformkuromoji_part_of_speechは、kuromojiのbuilt in token filterです。
Setting Description
tokenizer 使用するtokenizerの名前.
filter オプション. 使用するtoken filterの名前のリスト.
char_filter オプション. 使用するchar filterの名前のリスト.
position_offset_gap An optional number of positions to increment between each field value of a field using this analyzer.

[Elasticsearch Reference - Analyzers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-analyzers.html)

Tokenizers

Built in Tokenizers

Tokenizer type description
[Standard Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-standard-tokenizer.html) standard European language向けのtokenizer.
[Edge NGram Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-edgengram-tokenizer.html) edgeNGram nGramを使ってテキストをトークンに分割するtokenizer.
[Keyword Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-tokenizer.html) keyword テキストを単一のトークンとして扱うtokenizer.
[Letter Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-letter-tokenizer.html) letter non-lettersでテキストをトークンに分割するtokenizer.
[Lowercase Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-lowercase-tokenizer.html) lowercase Letter TokenizerLower Case Token Filterを一緒に使用するのと同じ機能
[NGram Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-ngram-tokenizer.html) nGram nGramを使ってテキストをトークンに分割するtokenizer.
[Whitespace Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-whitespace-tokenizer.html) whitespace 半角スペースでテキストをトークンに分割するtokenizer.
[Pattern Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-tokenizer.html) pattern 正規表現でテキストをトークンに分割するtokenizer.
[UAX Email URL Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html) uax_url_email URLやメールアドレスをトークンに分割するtokenizer.
[Path Hierarchy Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pathhierarchy-tokenizer.html) path_hierarchy パスの構造をトークンにするtokenizer.(パス区切り文字で分割するわけでありません)
[Classic Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-classic-tokenizer.html) classic 英文向けのtokenizer.
[Thai Tokenizer] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-thai-tokenizer.html) thai タイ語向け?のtokenizer.

edgeNGramとnGram

elasticというテキストをnGramedgeNGramでトークンにした場合の結果

  • min_gram: 2
  • max_gram: 3
  • token_chars: letter,digit

nGram

position type token
1 word el
2 word ela
3 word la
4 word las
5 word as
6 word ast
7 word st
8 word sti
9 word ti
10 word tic
11 word ic

edgeNGram

position type token
1 word el
2 word ela

uax_url_email

Elasticsearch reference <a href =\"https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html\">UAX Email URL Tokenizer</a>というテキストをトークンにした場合の結果

position type token
1 ALPHANUM Elasticsearch
2 ALPHANUM reference
3 ALPHANUM a
4 ALPHANUM href
5 URL https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html
6 ALPHANUM UAX
7 ALPHANUM Email
8 ALPHANUM URL
9 ALPHANUM Tokenizer
10 ALPHANUM a

path_hierarchy

C:/Windows/System32/drivers/etcというテキストをトークンにした場合の結果

position type token
1 word C:
1 word C:/Windows
1 word C:/Windows/System32
1 word C:/Windows/System32/drivers
1 word C:/Windows/System32/drivers/etc

[Elasticsearch Reference - Tokenizers] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-tokenizers.html)

Token Filters

Token Filter type
[Standard Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-standard-tokenfilter.html) standard
[ASCII Folding Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-asciifolding-tokenfilter.html) asciifolding
[Length Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-length-tokenfilter.html) length
[Lowercase Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-lowercase-tokenfilter.html) lowercase
[Uppercase Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uppercase-tokenfilter.html) uppercase
[NGram Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-ngram-tokenfilter.html) nGram
[Edge NGram Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-edgengram-tokenfilter.html) edgeNGram
[Porter Stem Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-porterstem-tokenfilter.html) porter_stem
[Shingle Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-shingle-tokenfilter.html) shingle
[Stop Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stop-tokenfilter.html) stop
[Word Delimiter Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-word-delimiter-tokenfilter.html) word_delimiter
[Stemmer Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stemmer-tokenfilter.html) stemmer
[Stemmer Override Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-stemmer-override-tokenfilter.html) stemmer_override
[Keyword Marker Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-marker-tokenfilter.html) keyword_marker
[Keyword Repeat Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keyword-repeat-tokenfilter.html) keyword_repeat
[KStem Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-kstem-tokenfilter.html) kstem
[Snowball Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-snowball-tokenfilter.html) snowball
[Phonetic Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-phonetic-tokenfilter.html) phonetic
[Synonym Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-synonym-tokenfilter.html) synonym
[Compound Word Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-compound-word-tokenfilter.html) dictionary_decompounder, hyphenation_decompounder
[Reverse Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-reverse-tokenfilter.html) reverse
[Elision Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-elision-tokenfilter.html) elision
[Truncate Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-truncate-tokenfilter.html) truncate
[Unique Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-unique-tokenfilter.html) unique
[Pattern Capture Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-capture-tokenfilter.html) pattern_capture
[Pattern Replace Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern_replace-tokenfilter.html) pattern_replace
[Trim Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-trim-tokenfilter.html) trim
[Limit Token Count Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-limit-token-count-tokenfilter.html) limit
[Hunspell Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-hunspell-tokenfilter.html) hunspell
[Common Grams Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-common-grams-tokenfilter.html) common_grams
[Normalization Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-normalization-tokenfilter.html) 下表参照
[CJK Width Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-cjk-width-tokenfilter.html) cjk_width
[CJK Bigram Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-cjk-bigram-tokenfilter.html) cjk_bigram
[Delimited Payload Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-delimited-payload-tokenfilter.html) delimited_payload_filter
[Keep Words Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keep-words-tokenfilter.html) keep
[Keep Types Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-keep-types-tokenfilter.html) keep_types
[Classic Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-classic-tokenfilter.html) classic
[Apostrophe Token Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-apostrophe-tokenfilter.html) apostrophe

Normalization Token Filter

language type
Arabic arabic_normalization
German german_normalization
Hindi hindi_normalization
Indic indic_normalization
Kurdish (Sorani) sorani_normalization
Persian persian_normalization
Scandinavian scandinavian_normalization, scandinavian_folding

[Elasticsearch Reference - Token Filters] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-tokenfilters.html)

Character Filters

Character Filter type
[Mapping Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-mapping-charfilter.html) mapping
[Mapping Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-mapping-charfilter.html) html_strip
[Pattern Replace Char Filter] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-pattern-replace-charfilter.html) pattern_replace

[Elasticsearch Reference - Character Filters] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-charfilters.html)

Analyze

Indices APIsの_analyzeを使用するとanalyzerの結果を確認することができます。Elasticsearchのbuilt in analyzerであればインデックスを指定する必要がありません。

Syntax
> curl -XGET "[host name][:port]/[index name]/_analyze?analyzer={analyzer name}&tokenizer={tokenizer name}&token_filters={}&char_filters={}" -d "The Bye-Bye Sky High IQ Murder Case"

standard

example
> curl -XGET "localhost:9200/_analyze?analyzer=standard" -d "The Bye-Bye Sky High IQ Murder Case"

analyzerの挙動を詳しく確認したい場合は、indexに設定した方がよいようです。
下記はtestというindexに確認したanalyzerを設定し、そのanalyzerを使用する例です。

PUT
curl -XPUT "localhost:9200/test?pretty" -d "{
  \"settings\": {
    \"analysis\": {
      \"analyzer\": {
        "my_analyzer": {
          \"type\": \"custom\",
          \"tokenizer\": \"my_tokenizer\"
        }
      },
      \"tokenizer\": {
        \"my_tokenizer\": {
          \"type\": \"path_hierarchy\",
          \"reverse\": false,
          \"skip\": 0
        }
      }
    }
  }
}"
GET
curl -XGET "localhost:9200/test/_analyze?analyzer=my_analyzer&pretty" -d "C:/Windows/System32/drivers/etc"

[Elasticsearch Reference - Analyze] (https://www.elastic.co/guide/en/elasticsearch/reference/1.6/indices-analyze.html)

16
16
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
16
16

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?