LoginSignup
16
16

More than 5 years have passed since last update.

Elasticsearch 1.6.0 - Installing on Windows 7

Posted at

概要

Windows7に開発・検証目的用にElasticsearch 1.6.0をインストールし初歩的な設定を行います。
その次にサンプルデータを登録して初歩的な検索方法の確認を行います。

環境

この記事の内容は下記のバージョンで動作確認を行いました。

WindowsにはcurlコマンドがありませんのでcURLを使用しました。

参考

下記のサイトを参考にさせて頂きました。

Elasticsearch

Slide

Blog

Qiita

インストール

Windows7にElasticsearchといくつかpluginをインストールします。

Elasticsearch

ダウンロードページよりアーカイブファイルをダウンロードし適当な場所に展開します。
ダウンロードしたアーカイブファイルはelasticsearch-1.6.0.zipです。

展開

インストールはアーカイブファイルを適当な場所へ展開するだけで済みます。
D:\dev\elasticsearch-1.6.0へ展開しました。

設定

開発・検証用なので最小リソースで起動するように設定します。
設定ファイルは展開したディレクトリのconf/elasticsearch.ymlです。

下記に変更点のみ抜粋します。

# Cluster name identifies your cluster for auto-discovery. If you're running
# multiple clusters on the same network, make sure you're using unique names.
#
cluster.name: elasticsearch

# Node names are generated dynamically on startup, so you're relieved
# from configuring them manually. You can tie this node to a specific name:
#
node.name: master

# Every node can be configured to allow or deny being eligible as the master,
# and to allow or deny to store the data.
#
# Allow this node to be eligible as a master node (enabled by default):
#
node.master: true
#
# Allow this node to store data (enabled by default):
#
node.data: true
  • node.name : node名を指定しない場合はElasticsearchインスタンスの起動時に自動的に命名されます。
# Note, that for development on a local machine, with small indices, it usually
# makes sense to "disable" the distributed features:
#
index.number_of_shards: 1
index.number_of_replicas: 0
  • 開発用なのでシャード数は1で、レプリカは行わないようにします。
# Set this property to true to lock the memory:
#
bootstrap.mlockall: true
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
#    to perform discovery when new nodes (master or data) are started:
#
discovery.zen.ping.unicast.hosts: ["localhost"]

起動

展開したディレクトリへ移動して下記のコマンドを実行します。
オプションで使用するメモリサイズを指定することができます。

> bin/elasticsearch.bat -Xmx256m -Xms256m

動作確認

curlかブラウザで下記のURLへアクセスしstatus 200のレスポンスが返ってくることを確認します。

GET
> curl -XGET "localhost:9200/"
response
{
  "status" : 200,
  "name" : "master",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.6.0",
    "build_hash" : "cdd3ac4dde4f69524ec0a14de3828cb95bbb86d0",
    "build_timestamp" : "2015-06-09T13:36:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

plugin

elasticsearch-head

URL: http://mobz.github.io/elasticsearch-head/

インストール

install
> bin/plugin -install mobz/elasticsearch-head

確認

下記のURLにアクセスしてへheadページが表示されることを確認します。

elasticsearch-analysis-kuromoji

URL: https://github.com/elastic/elasticsearch-analysis-kuromoji

インストール

install
> bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/2.6.0

elasticsearch-inquisitor

URL: https://github.com/polyfractal/elasticsearch-inquisitor

インストール

install
> bin/plugin -install polyfractal/elasticsearch-inquisitor

確認

下記のURLにアクセスしてInquisitorページが表示されることを確認します。

インストールしたプラグインの確認

> bin\plugin -l
response
Installed plugins:
    - analysis-kuromoji
    - head
    - inquisitor

基本的な検索方法の確認

サンプルデータをインデックスし何通りかの方法で検索をします。

サンプルデータの準備

サンプルデータにはテレビドラマの情報を使用します。

field data type description
title 文字列 原題
original_air_date 文字列 放送日
runtime 整数 放送時間(分)
guest_staring 文字列 ゲスト出演
guest_staring_role 文字列 ゲスト役柄
directed_by 文字列 監督
written_by 文字配列 脚本
teleplay 文字配列 テレビ脚本
season 整数 シーズン
no_in_season 整数 シーズン回
no_in_series 整数 放送回
japanese_title 文字列 邦題
japanese_air_date 日付 日本放送日

mapping

index: tvfile
type: columbo

サンプルデータのマッピング

columbo_mapping.json
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "analysis": {
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": ["letter", "digit"]
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "standard"
          ]
        },
        "letter_lower_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "lowercase"
          ]
        },
        "letter_upper_analyzer": {
          "type": "custom",
          "tokenizer": "letter",
          "filter": [
            "uppercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "columbo": {
      "_source": {
        "enabled": true
      },
      "_all": {
        "enabled": true
      },
      "_timestamp": {
        "enabled": true
      },
      "dynamic": "strict",
      "properties": {
        "title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "original_air_date": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "runtime": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "guest_staring": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "guest_staring_role": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "directed_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "written_by": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "teleplay": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "letter_lower_analyzer",
          "store": true,
          "include_in_all": true
        },
        "season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_season": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "no_in_series": {
          "type": "integer",
          "store": true,
          "include_in_all": false
        },
        "japanese_title": {
          "type": "string",
          "index": "analyzed",
          "analyzer": "kuromoji_analyzer",
          "store": true,
          "include_in_all": true
        },
        "japanese_air_date": {
          "type": "date",
          "format": "dateHourMinuteSecond",
          "store": true,
          "include_in_all": false
        }
      }
    }
  }
}

indexの作成

上記のjsonファイルを使用してindexを作成しmappingを設定します。

POST
> curl -XPOST "localhost:9200/tvfile?pretty" -d @columbo_mapping.json

mappingの確認

GET
> curl -XGET "localhost:9200/tvfile/_settings,_mappings?pretty"

indexを削除する場合

DELETE
> curl -XDELETE "localhost:9200/tvfile"

サンプルデータ

長くなるので一部分のみ掲載します。サンプルデータ全文はこちらのページにあります。

columbo_data.json
{"index": {}}
{"title": "Prescription: Murder",                "original_air_date": "February 20, 1968",   "runtime": 98, "guest_staring": "Gene Barry",        "guest_staring_role": "Dr. Ray Fleming (Gene Barry), a psychiatrist",                                              "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": [""],                                                                     "season": 0, "no_in_season": 1, "no_in_series": 1,  "japanese_title": "殺人処方箋",                   "japanese_air_date": "1972-08-27T00:00:00"}
{"index": {}}
{"title": "Ransom for a Dead Man",               "original_air_date": "March 1, 1971",       "runtime": 98, "guest_staring": "Lee Grant",         "guest_staring_role": "Leslie Williams, a brilliant lawyer and pilot",                                             "directed_by": "Richard Irving",       "written_by": ["Richard Levinson & William Link"],                  "teleplay": ["Dean Hargrove"],                                                        "season": 0, "no_in_season": 2, "no_in_series": 2,  "japanese_title": "死者の身代金",                 "japanese_air_date": "1973-04-22T00:00:00"}
{"index": {}}
{"title": "Murder by the Book",                  "original_air_date": "September 15, 1971",  "runtime": 73, "guest_staring": "Jack Cassidy",      "guest_staring_role": "Ken Franklin is one half of a mystery writing team",                                        "directed_by": "Steven Spielberg",     "written_by": ["Steven Bochco"],                                    "teleplay": [""],                                                                     "season": 1, "no_in_season": 1, "no_in_series": 3,  "japanese_title": "構想の死角",                   "japanese_air_date": "1972-11-26T00:00:00"}

ドキュメントの登録

上記のjsonファイルを使用してドキュメントをインデックスします。

POST
> curl -XPOST "localhost:9200/tvfile/columbo/_bulk?pretty" --data-binary @columbo_data.json

ドキュメントの全削除する場合

DELETE
> curl -XDELETE "localhost:9200/tvfile/columbo?pretty"

ドキュメント件数をカウント

GET
> curl -XGET "localhost:9200/tvfile/columbo/_count?pretty" -d "{
  \"query\": {
      \"matchAll\": {}
  }
}"
response
{
  "count" : 45,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

ドキュメントの検索

検索はSearch APIsを使用します。検索方法はURI SearchRequest Body Searchがあります。

Syntax
[host name][:port]/[index name]/[type name]/_search

検索結果に含まれる共通フィールド

field description
took 検索にかかった時間(ミリ秒).
timed_out 検索がタイムアウトしたかどうかを真偽値で表現.
_shards 検索できたシャード数および検索できなかったシャード数.
hits 検索結果を保持
hits.total 検索条件に一致するドキュメントの件数.
hits.hits 検索結果に一致するドキュメントの配列.(デフォルトは10件)
_score ドキュメントのスコア.
max_score 最大スコア.

The Search API

URI Search

URL Searchはリクエストパラメータに検索条件を指定して検索します。

無条件で検索

検索条件はqパラメータで指定します。

GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?q=*&from=0&size=10&pretty"
条件を指定して検索
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?q=September%20Patrick&df=_all&default_operator=OR&from=0&size=10&_source=false&fields=title,original_air_date,runtime,guest_staring,directed_by,written_by,season,no_in_season&sort=season:asc,no_in_season:asc&track_scores=true&pretty"

Parameters

name default / description
q The query string.
df The default field to use when no field prefix is defined within the query.
analyzer The analyzer name.
lowercase_expanded_terms Defaults to true.
analyze_wildcard Defaults to false.
default_operator can be AND or OR. Defaults to OR.
lenient Defaults to false.
explain For each hit, contain an explanation of how scoring of the hits was computed.
_source Set to false to disable retrieval of the _source field.
fields The selective stored fields of the document to return for each hit, comma delimited.
sort Sorting to perform. Can either be in the form of fieldName, or fieldName:asc / fieldName:desc.
track_scores When sorting, set to true in order to still track scores and return them as part of each hit.
timeout Defaults to no timeout.
terminate_after The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early.
from Defaults to 0.
size Defaults to 10.
search_type Defaults to query_then_fetch.
  • qに検索するキーワードを指定します。
  • dfに検索する対象のフィールド名を指定します。デフォルトは_allです。
  • _sourcefalseを指定すると検索結果に_sourceフィールドを含めません。
  • fieldsに検索結果に含めたいフィールド名をカンマ区切りで指定します。
  • track_scorestrueを指定するとソート時でもスコアを計算します。(デフォルトではソートを行うとスコアを計算しません。)
  • fromsizeで検索するドキュメントの位置を指定できます。

Elasticsearch Reference - URI Search

Request Body Search

Request Body Searchはリクエストボディに検索条件を指定します。
検索の種類にQueriesFiltersがあります。

これらの違いは

  • Queriesは全文検索および単語検索ができますが、Filtersは単語検索のみです。
  • Queriesはスコアを計算しますが、Filtersはスコアを計算しません。
  • QueriesFiltersに比べてコストがかかります。
  • Queriesは検索結果をキャッシュしませんが、Filtersはキャッシュをします。

QueriesFiltersを組み合わせて使用することもできます。

Queries
Match All Query

querymatchAllを指定すると、無条件での検索になります。

columbo_match_all_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}
  • _sourcefalseを指定したので検索結果に_sourceフィールドは含まれません。
  • fromに0,sizeに100を指定したので先頭から100件まで取得します。(sizeのデフォルトは10です。)
  • fieldsに検索結果に含めたいフィールド名を指定します。
  • sortでドキュメントの並び順を指定します。
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_query.json

Elasticsearch Reference - Match All Query

Match Query

querymatchを指定すると、指定したフィールド(この例ではoriginal_air_date)に対してqueryで指定したキーワードを検索します。

columbo_match_query.json
{
  "_source": false,
  "from": 3,
  "size": 3,
  "query": {
    "match": {
      "original_air_date": {
        "query": "September December",
        "operator": "OR"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query.json
response
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 7,
    "max_score" : 1.4385337,
    "hits" : [ {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpHw",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Nicholas Colasanto" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "John Cassavetes" ],
        "original_air_date" : [ "September 17, 1972" ],
        "no_in_series" : [ 10 ],
        "runtime" : [ 98 ],
        "season" : [ 2 ],
        "title" : [ "テ液ude in Black" ],
        "written_by" : [ "Richard Levinson & William Link" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhGw7_5S8xhhpH4",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Jeannot Szwarc" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Vera Miles" ],
        "original_air_date" : [ "September 23, 1973" ],
        "no_in_series" : [ 18 ],
        "runtime" : [ 73 ],
        "season" : [ 3 ],
        "title" : [ "Lovely But Lethal" ],
        "written_by" : [ "Myrna Bercovici" ]
      }
    }, {
      "_index" : "tvfile",
      "_type" : "columbo",
      "_id" : "AU5lJlhHw7_5S8xhhpIA",
      "_score" : 0.9509891,
      "fields" : {
        "directed_by" : [ "Bernard L. Kowalski" ],
        "no_in_season" : [ 1 ],
        "guest_staring" : [ "Robert Conrad" ],
        "original_air_date" : [ "September 15, 1974" ],
        "no_in_series" : [ 26 ],
        "runtime" : [ 98 ],
        "season" : [ 4 ],
        "title" : [ "An Exercise in Fatality" ],
        "written_by" : [ "Larry Cohen" ]
      }
    } ]
  }
}

Elasticsearch Reference - Match Query

Multi Match Query

querymulti_matchを指定すると、fieldsで指定した複数のフィールドに対してqueryで指定したキーワードを検索します。

columbo_multi_match_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "multi_match": {
      "query": "October Patrick",
      "type": "cross_fields",
      "fields": ["original_air_date", "guest_staring"],
      "operator": "AND"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_multi_match_query.json

typeに指定できる値とその意味は下記の通りです。

Types of multi_match query

type description
best_fields default. Finds documents which match any field, but uses the _score from the best field.
most_fields Finds documents which match any field and combines the _score from each field.
cross_fields Treats fields with the same analyzer as though they were one big field. Looks for each word in any field.
phrase Runs a match_phrase query on each field and combines the _score from each field.
phrase_prefix Runs a match_phrase_prefix query on each field and combines the _score from each field.

Elasticsearch Reference - Multi Match Query

Query String Query

queryquery_stringを指定すると、他のQueryより複雑な条件指定が可能になります。

columbo_query_string.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "query_string": {
      "fields" : ["_all"],
      "query": "(September OR Patrick) AND (season:5)"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_string.json

default_field

クエリで検索するフィールドを明示的に指定しない場合に参照されるフィールドです。
デフォルトは_allフィールドになります。

別のフィールドを指定する場合は

example
{
  "settings": {
    "index": {
      "query": {
        "default_field": "_all"
      }
    }
  }
}

Elasticsearch Reference - Query String Query

Simple Query String Query

simple_query_stringquery_stringの簡易版です。

columbo_simple_query_string.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "simple_query_string" : {
      "query": "(September | October | November) +(McGoohan)",
      "fields": ["_all"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_simple_query_string.json

Available flags

flag description
ALL
NONE
AND +
OR `
NOT -
PREFIX *
PHRASE "
PRECEDENCE ( and )
ESCAPE
WHITESPACE
FUZZY ~N after a word
NEAR
SLOP ~N after a phrase

Elasticsearch Reference - Simple Query String Query

Term Query

querytermを指定すると、termで指定するフィールドの値と完全に一致するドキュメントを検索します。

columbo_term_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_query.json

Elasticsearch Reference - Term Query

Bool Query

queryboolを指定すると、複数のqueryを組み合わせて検索することができます。

columbo_bool_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "bool": {
      "must": {
        "match": {
          "_all": {
            "query": "September Patrick",
            "operator": "OR"
          }
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_query.json

The occurrence types

occur description
must The clause (query) must appear in matching documents.
should The clause (query) should appear in the matching document.
must_not The clause (query) must not appear in the matching documents.
  • shouldを指定をした場合、minimum_should_matchパラメータで最小のマッチ数を指定できます。

Elasticsearch Reference - Bool Query

Range Query

queryrangeを指定すると、rangeで指定するフィールドの値で範囲検索することができます。

columbo_range_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_query.json

Elasticsearch Reference - Range Query

Ids Query

queryidsを指定すると、_idフィールドの値で検索することができます。

columbo_ids_query.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_query.json

Elasticsearch Reference - Ids Query

Filters
Match All Filter
columbo_match_all_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "matchAll": {}
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "season": {"order": "asc"}
    },
    {
      "no_in_season": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_all_filter.json

Elasticsearch Reference - Match All Filter

Query Filter
columbo_query_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "query": {
      "query_string" : {
        "fields" : ["_all"],
        "query": "(September OR Patrick) AND (season:5)"
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_query_filter.json

Elasticsearch Reference - Query Filter

Term Filter
columbo_term_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "term": {
      "japanese_air_date": "1973-02-25T00:00:00"
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_term_filter.json

Elasticsearch Reference - Term Filter

Bool Filter
columbo_bool_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "bool": {
      "must": {
        "term": {
          "no_in_season": {"value": 1}
        }
      },
      "must": {
        "term": {
          "season": {"value": 5}
        }
      }
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_bool_filter.json

Elasticsearch Reference - Bool Filter

Range Filter
columbo_range_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 20, "lte": 24}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "no_in_series": {"order": "asc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_range_filter.json

Elasticsearch Reference - Range Filter

Ids Filter
columbo_ids_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "filter": {
    "ids": {
      "values": ["AU5YtptcIueIPY5pgX5J","AU5YtptcIueIPY5pgX5K","AU5YtptcIueIPY5pgX5L"]
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season", "japanese_air_date"
  ]
}
  • _idの値を自動生成する場合、ドキュメントを登録するたびに_id値が変わりますので上記のjsonをそのまま使用して検索しても結果は得られません。
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_ids_filter.json

Elasticsearch Reference - Ids Filter

QueryとFilterの組み合わせ
columbo_match_query_range_filter.json
{
  "_source": false,
  "from": 0,
  "size": 100,
  "query": {
    "match": {
      "_all": {
        "query": "September Patrick",
        "operator": "OR"
      }
    }
  },
  "filter" : {
    "range" : {
      "no_in_series":{"gte": 30, "lte": 39}
    }
  },
  "fields": [
    "title", "original_air_date", "runtime", "guest_staring", "directed_by", "written_by", "no_in_series", "season", "no_in_season"
  ],
  "track_scores": true,
  "sort": [
    {
      "_score": {"order": "desc"}
    }
  ]
}
GET
> curl -XGET "localhost:9200/tvfile/columbo/_search?pretty" -d @columbo_match_query_range_filter.json

クエリパラメータ

query parameters

parameter default / description
timeout Defaults to no timeout.
from Defaults to 0.
size Defaults to 10.
search_type Defaults to query_then_fetch.
query_cache Set to true or false to enable or disable the caching of search results.
terminate_after Defaults to no terminate_after. [experimental]
  • timeoutはタイムアウトする時間を文字列で指定します。指定できる単位は下記のTime unitsにある通りです。
  • search_typequery_cacheはクエリパラメータで指定します。

Time units

unit description
y Year
M Month
w Week
d Day
h Hour
m Minute
s Second

Elasticsearch Reference - Request Body Search

Elasticsearchの仕様メモ

mapping

Fields

ドキュメントのマッピングで使用できるフィールド

field default / description
_uid Each document indexed is associated with an id and a type, the internal _uid field is the unique identifier of a document within an index and is composed of the type and the id.
_id By default it is not indexed and not stored (thus, not created).
_type By default, the _type field is indexed (but not analyzed) and not stored.
_source The _source field is an automatically generated field that stores the actual JSON that was used as the indexed document.
_all The idea of the _all field is that it includes the text of one or more other fields within the document indexed.
_analyzer Deprecated in 1.5.0.
_boost Deprecated in 1.0.0.RC1.
_parent The parent field mapping is defined on a child mapping, and points to the parent type this child relates to.
_field_names The _field_names field indexes the field names of a document, which can later be used to search for documents based on the fields that they contain typically using the exists and missing filters.
_routing The routing field allows to control the _routing aspect when indexing data and explicit routing control is required.
_index By default it is disabled.
_size By default it is disabled.
_timestamp By default it is disabled.
_ttl By default it is disabled.

Elasticsearch Reference - Fields

Types

ドキュメントのマッピングで使用できるデータタイプ

Elasticsearch Reference - Types

Core Types
string

attributes

attribute default / description
index_name Defaults to the property/field name.
store Defaults to false.
index Defaults to analyzed. not_analyzed, no
doc_values Set to true to store field values in a column-stride fashion.
term_vector Defaults to no.
boost Defaults to 1.0.
null_value Defaults to not adding the field at all.
norms: {enabled: <value>} Defaults to true for analyzed fields, and to false for not_analyzed fields.
norms: {loading: <value>} possible values are eager and lazy (default).
index_options Defaults to positions for analyzed fields, and to docs for not_analyzed fields.
analyzer Defaults to the globally configured analyzer.
index_analyzer The analyzer used to analyze the text contents when analyzed during indexing.
search_analyzer The analyzer used to analyze the field when part of a query string.
include_in_all If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.
ignore_above The analyzer will ignore strings larger than this size.
position_offset_gap Defaults to 0.

copy_to

copy_toを使用すると別のフィールドへ値をコピーすることができます。

example
{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "copy_to": "contents"
    },
    "contents": {
      "type": "string"
    }
  }
}

fields

multi_fieldタイプはversion 1.0でCore Typesから削除されました。
fieldsを使用することで1つのJSONソースフィールドを複数のフィールドへマップすることができます。

example
{
  "properties": {
    "title": {
      "type": "string",
      "index": "analyzed",
      "fields": {
        "raw": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}
Number

数値型にはfloat,double,byte,short,integer,longがあります。

attributes

attribute default / description
type float, double, integer, long, short, byte. Required.
index_name Defaults to the property/field name.
store Defaults to false.
index Set to no if the value should not be indexed. Setting to no disables include_in_all.
doc_values Set to true to store field values in a column-stride fashion.
precision_step Defaults to 16 for long, double, 8 for short, integer, float, 2147483647 for byte.
boost Defaults to 1.0.
null_value Defaults to not adding the field at all.
include_in_all If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.
ignore_malformed Defaults to false.
coerce Defaults to true.
Date

attributes

attribute description
index_name Defaults to the property/field name.
format Defaults to dateOptionalTime.
store Defaults to false.
index Set to no if the value should not be indexed. Setting to no disables include_in_all.
doc_values Set to true to store field values in a column-stride fashion.
precision_step Defaults to 16.
boost Defaults to 1.0.
null_value Defaults to not adding the field at all.
include_in_all If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.
ignore_malformed Defaults to false.
numeric_resolution Possible values include seconds and milliseconds (default).
Boolean

attributes

attribute default / description
index_name Defaults to the property/field name.
store Defaults to false.
index Set to no if the value should not be indexed. Setting to no disables include_in_all.
boost Defaults to 1.0.
null_value Defaults to not adding the field at all.
Binary

attributes

attribute default / description
index_name Defaults to the property/field name.
store Defaults to false.
doc_values Set to true to store field values in a column-stride fashion.
compress Set to true to compress the stored binary value.
compress_threshold Defaults to -1
Root Object Type

Root Object Type

type default / description
dynamic_date_formats dynamic_date_formats is the ability to set one or more date formats that will be used to detect date fields.
date_detection Allows to disable automatic date type detection.
numeric_detection Sometimes, even though json has support for native numeric types, numeric values are still provided as strings.
dynamic_templates Dynamic templates allow to define mapping templates that will be applied when dynamic introduction of fields / objects happens.

Elasticsearch Reference - Root Object Type

Date Format

Built in Formatsの抜粋

format pattern expected
basic_date yyyyMMdd 20060102
basic_date_time yyyyMMdd'T'HHmmss.SSSZ 20060102T150405.999+0900
basic_date_time_no_millis yyyyMMdd'T'HHmmssZ 20060102T150405+0900
date yyyy-MM-dd 2006-01-02
date_time yyyy-MM-dd'T'HH:mm:ss.SSSZZ 2006-01-02T15:04:05.999+09:00
date_time_no_mills yyyy-MM-dd'T'HH:mm:ssZZ 2006-01-02T15:04:05+09:00
date_optional_time yyyy-MM-dd 2006-01-02
date_optional_time yyyy-MM-dd'T'HH:mm:ss 2006-01-02T15:04:05
date_hour_minute_second yyyy-MM-dd'T'HH:mm:ss 2006-01-02T15:04:05
date_hour_minute_second_millis yyyy-MM-dd'T'HH:mm:ss.SSS 2006-01-02T15:04:05.999

Elasticsearch Reference - mapping-data-formt

Analysis

Analyzerは1個のTokenizerと0個以上のToken Filterの組み合わせです。

example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "{analyzer論理名}": {
          "type": "使用するanalyzerの指定",
          "使用するanalyzer固有の設定"
        },
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform",
            "kuromoji_pos_filter"
          ]
        },
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "my_filter"
          ],
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "tokenizer": {
        "{tokenizer論理名}": {
          "type": "使用するtokenizerの指定",
          "使用するtokenizer固有の設定"
        },
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        },
        "my_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "3",
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "filter": {
        "{filter論理名}": {
          "type": "使用するfilterの指定",
          "使用するfilter固有の設定"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        },
        "my_filter": {
          "type": "stop",
          "stopwords": ["NGWORD_A", "NGWORD_B", "NGWORD_C"]
        }
      },
      "char_filter": {
        "{char_filter論理名}": {
          "type": "使用するchar_filterの指定",
          "使用するchar_filter固有の設定"
        },
        "my_char_filter": {
          "type": "mapping",
          "mappings" : ["kb=>kilobyte","mb=>megabyte","gb=>gigabyte"]
        }
      }
    },
    "index": {
      "indexの設定"
    }
  },
  "mappings": {
    "{type名}": {
      "typeの設定"
    },
    "{type名}": {
      "typeの設定"
    }
  }
}

Analyzers

Built in Analyzers

Analyzers type description
Standard Analyzer standard Standard Tokenizer,the Standard Token Filter,Lower Case Token Filter,Stop Token Filterから構成されるanalyzer.
Simple Analyzer simple Lower Case Tokenizerから構成されるanalyzer.
Whitespace Analyzer whitespace Whitespace Tokenizerから構成されるanalyzer.
Stop Analyzer stop Lower Case TokenizerStop Token Filterから構成されるanalyzer.
Keyword Analyzer keyword 与えられた文字全体を1つのトークンとして処理するanalyzer.
Pattern Analyzer pattern 正規表現を使用するanalyzer.
Language Analyzers 下表参照 特定言語用のanalyzer.
Snowball Analyzer snowball standard tokenizer, standard filter, lowercase filter, stop filter, snowball filterから構成されるanalyzer.
Custom Analyzer custom 任意のTokenizer, 0個以上の任意のToken Filters, 0個以上の任意のChar Filtersを組み合わせて構成するanalyzer.

The following types are supported

type language
arabic アラビア語
armenian アルメニア語
basque バスク語
brazilian ポルトガル語(ブラジル)
bulgarian ブルガリア語
catalan カタロニア語
chinese 中国語
cjk CJK統合漢字
czech チェコ語
danish デンマーク語
dutch オランダ語
english 英語
finnish フィンランド語
french フランス語
galician ガリシア語
german ドイツ語
greek ギリシャ語
hindi ヒンディー語
hungarian ハンガリー語
indonesian インドネシア語
irish アイルランド語
italian イタリア語
latvian ラトビア語
norwegian ノルウェー語
persian ペルシャ語
portuguese ポルトガル語
romanian ルーマニア語
russian ロシア語
sorani クルド語(?)
spanish スペイン語
swedish スウェーデン語
turkish トルコ語
thai タイ語
Custom Analyzerの設定サンプル

kuromojiの設定を例にしたCustom Analyzerの設定サンプルです。

example
{
  "settings": {
    "analysis": {
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        }
      },
      "filter": {
        "greek_lowercase_filter": {
          "type": "lowercase",
          "language": "greek"
        },
        "kuromoji_pos_filter": {
          "type": "kuromoji_part_of_speech"
        }
      },
      "analyzer": {
        "kuromoji_analyzer": {
          "type": "custom",
          "tokenizer": "kuromoji",
          "filter": [
            "kuromoji_baseform", "kuromoji_pos_filter", "greek_lowercase_filter", "cjk_width"
          ]
        }
      }
    }
  }
}
  • kuromoji_tokenizerは、kuromojiのbuilt in tokenizerです。
  • kuromoji_baseformkuromoji_part_of_speechは、kuromojiのbuilt in token filterです。
Setting Description
tokenizer 使用するtokenizerの名前.
filter オプション. 使用するtoken filterの名前のリスト.
char_filter オプション. 使用するchar filterの名前のリスト.
position_offset_gap An optional number of positions to increment between each field value of a field using this analyzer.

Elasticsearch Reference - Analyzers

Tokenizers

Built in Tokenizers

Tokenizer type description
Standard Tokenizer standard European language向けのtokenizer.
Edge NGram Tokenizer edgeNGram nGramを使ってテキストをトークンに分割するtokenizer.
Keyword Tokenizer keyword テキストを単一のトークンとして扱うtokenizer.
Letter Tokenizer letter non-lettersでテキストをトークンに分割するtokenizer.
Lowercase Tokenizer lowercase Letter TokenizerLower Case Token Filterを一緒に使用するのと同じ機能
NGram Tokenizer nGram nGramを使ってテキストをトークンに分割するtokenizer.
Whitespace Tokenizer whitespace 半角スペースでテキストをトークンに分割するtokenizer.
Pattern Tokenizer pattern 正規表現でテキストをトークンに分割するtokenizer.
UAX Email URL Tokenizer uax_url_email URLやメールアドレスをトークンに分割するtokenizer.
Path Hierarchy Tokenizer path_hierarchy パスの構造をトークンにするtokenizer.(パス区切り文字で分割するわけでありません)
Classic Tokenizer classic 英文向けのtokenizer.
Thai Tokenizer thai タイ語向け?のtokenizer.

edgeNGramとnGram

elasticというテキストをnGramedgeNGramでトークンにした場合の結果

  • min_gram: 2
  • max_gram: 3
  • token_chars: letter,digit

nGram

position type token
1 word el
2 word ela
3 word la
4 word las
5 word as
6 word ast
7 word st
8 word sti
9 word ti
10 word tic
11 word ic

edgeNGram

position type token
1 word el
2 word ela

uax_url_email

Elasticsearch reference <a href =\"https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html\">UAX Email URL Tokenizer</a>というテキストをトークンにした場合の結果

position type token
1 ALPHANUM Elasticsearch
2 ALPHANUM reference
3 ALPHANUM a
4 ALPHANUM href
5 URL https://www.elastic.co/guide/en/elasticsearch/reference/1.6/analysis-uaxurlemail-tokenizer.html
6 ALPHANUM UAX
7 ALPHANUM Email
8 ALPHANUM URL
9 ALPHANUM Tokenizer
10 ALPHANUM a

path_hierarchy

C:/Windows/System32/drivers/etcというテキストをトークンにした場合の結果

position type token
1 word C:
1 word C:/Windows
1 word C:/Windows/System32
1 word C:/Windows/System32/drivers
1 word C:/Windows/System32/drivers/etc

Elasticsearch Reference - Tokenizers

Token Filters

Token Filter type
Standard Token Filter standard
ASCII Folding Token Filter asciifolding
Length Token Filter length
Lowercase Token Filter lowercase
Uppercase Token Filter uppercase
NGram Token Filter nGram
Edge NGram Token Filter edgeNGram
Porter Stem Token Filter porter_stem
Shingle Token Filter shingle
Stop Token Filter stop
Word Delimiter Token Filter word_delimiter
Stemmer Token Filter stemmer
Stemmer Override Token Filter stemmer_override
Keyword Marker Token Filter keyword_marker
Keyword Repeat Token Filter keyword_repeat
KStem Token Filter kstem
Snowball Token Filter snowball
Phonetic Token Filter phonetic
Synonym Token Filter synonym
Compound Word Token Filter dictionary_decompounder, hyphenation_decompounder
Reverse Token Filter reverse
Elision Token Filter elision
Truncate Token Filter truncate
Unique Token Filter unique
Pattern Capture Token Filter pattern_capture
Pattern Replace Token Filter pattern_replace
Trim Token Filter trim
Limit Token Count Token Filter limit
Hunspell Token Filter hunspell
Common Grams Token Filter common_grams
Normalization Token Filter 下表参照
CJK Width Token Filter cjk_width
CJK Bigram Token Filter cjk_bigram
Delimited Payload Token Filter delimited_payload_filter
Keep Words Token Filter keep
Keep Types Token Filter keep_types
Classic Token Filter classic
Apostrophe Token Filter apostrophe

Normalization Token Filter

language type
Arabic arabic_normalization
German german_normalization
Hindi hindi_normalization
Indic indic_normalization
Kurdish (Sorani) sorani_normalization
Persian persian_normalization
Scandinavian scandinavian_normalization, scandinavian_folding

Elasticsearch Reference - Token Filters

Character Filters

Character Filter type
Mapping Char Filter mapping
Mapping Char Filter html_strip
Pattern Replace Char Filter pattern_replace

Elasticsearch Reference - Character Filters

Analyze

Indices APIsの_analyzeを使用するとanalyzerの結果を確認することができます。Elasticsearchのbuilt in analyzerであればインデックスを指定する必要がありません。

Syntax
> curl -XGET "[host name][:port]/[index name]/_analyze?analyzer={analyzer name}&tokenizer={tokenizer name}&token_filters={}&char_filters={}" -d "The Bye-Bye Sky High IQ Murder Case"

standard

example
> curl -XGET "localhost:9200/_analyze?analyzer=standard" -d "The Bye-Bye Sky High IQ Murder Case"

analyzerの挙動を詳しく確認したい場合は、indexに設定した方がよいようです。
下記はtestというindexに確認したanalyzerを設定し、そのanalyzerを使用する例です。

PUT
curl -XPUT "localhost:9200/test?pretty" -d "{
  \"settings\": {
    \"analysis\": {
      \"analyzer\": {
        "my_analyzer": {
          \"type\": \"custom\",
          \"tokenizer\": \"my_tokenizer\"
        }
      },
      \"tokenizer\": {
        \"my_tokenizer\": {
          \"type\": \"path_hierarchy\",
          \"reverse\": false,
          \"skip\": 0
        }
      }
    }
  }
}"
GET
curl -XGET "localhost:9200/test/_analyze?analyzer=my_analyzer&pretty" -d "C:/Windows/System32/drivers/etc"

Elasticsearch Reference - Analyze

16
16
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
16
16