More than 5 years have passed since last update.

レストランデータセットを使ったElasticsearchチュートリアルを参考にしながらクエリをいろいろ試してみる

Elasticsearch

Last updated at 2016-08-30Posted at 2016-08-16

環境

Elasticsearch: 2.3.5
kibana: 4.5.1 (マップがエラーになる件のものは4.5.4を入れて試している)

すべてのクエリはSenseのプラグインを入れて発行しています。
なお、検索結果は適宜省略しています。

事前準備

Elasticsearchチュートリアルを見てから、自分でレストランデータをElasticsearchに入れる。なおデータセットのフィールドの詳しい説明はここにある
http://qiita.com/wapa5pow/items/31e966b9d251e7fd76ec でデータを入れる手順を示した

はじめに

すべてを条件なしで検索してみます。

クエリ

GET ldgourmet/restaurant/_search
{
  "query": {
    "match_all": {}
  }
}

結果(最初なので、検索結果に解説を付け加えます.

{
  "took": 2,  //検索結果を実行するためかかった時間[msec]
  "timed_out": false,  //検索がタイムアウトしたかどうか
  "_shards": {  //いくつシャードが検索に使われたか
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 214241,  //検索結果の合計数
    "max_score": 1, 
    "hits": [
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "19222",
        "_score": 1,
        "_source": {
          "description": "東京メトロ日比谷線広尾駅から徒歩7分",
          "access_count": "1813",
          "closed": "0",
          "address": "渋谷区広尾５-８-１４東京建物広尾ビルB1",
          "pref_id": "13",
          "open_lunch": "1",
          "special_count": "0",
          "modified_on": "2011-04-20 22:00:07",
          "name_kana": "あさの",
          "purpose": null,
          "alphabet": null,
          "station_id1": "2934",
          "property": null,
          "name": "浅野",
          "category_id1": "302",
          "station_time1": "6",
          "station_distance2": "1403",
          "fan_count": "0",
          "category_id5": "0",
          "open_late": "0",
          "created_on": "2005-02-13 22:36:40",
          "station_distance1": "482",
          "category_id4": "0",
          "zip": null,
          "area_id": "7",
          "id": "19222",
          "north_latitude": "35.38.42.216",
          "category_id3": "0",
          "station_distance3": "1707",
          "open_morning": "0",
          "photo_count": "0",
          "station_time2": "18",
          "menu_count": "0",
          "category_id2": "332",
          "station_time3": "21",
          "station_id2": "1673",
          "east_longitude": "139.43.28.610",
          "station_id3": "8921"
        }
      },
...

また、特定のフィールドをページングを指定して検索するには以下のようにします。

GET ldgourmet/restaurant/_search
{
  "from": 9,
  "size": 3,
  "fields": ["name", "description"],
  "query": {
    "term": { "name": "居酒屋" }
  }
}

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4410,
    "max_score": 2.787416,
    "hits": [
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "335330",
        "_score": 2.787416,
        "fields": {
          "name": [
            "居酒屋　味平"
          ]
        }
      },
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "310316",
        "_score": 2.787416,
        "fields": {
          "name": [
            "居酒屋 遊味"
          ],
          "description": [
            "阪神タイガースファンが集まる店  東武線蒲生駅西口から徒歩１分"
          ]
        }
      },
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "475390",
        "_score": 2.787416,
        "fields": {
          "name": [
            "居酒屋　和味"
          ]
        }
      }
    ]
  }
}

また、特定のドキュメントを消したい場合は以下のようにします。

DELETE ldgourmet/restaurant/19222

基本的なクエリ

term, terms

termクエリは先ほど出てきましたが、termsクエリは、複数の語句を指定のフィールドにマッチしているか確かめる。以下の場合だと、居酒屋もしくはあゆにマッチした語句をもつものが検索結果として取得される。結果はスコアの高い順にならべられるので、「居酒屋　あゆ」がトップにくる。

GET ldgourmet/restaurant/_search
{
  "query": {
    "terms": { 
      "name": ["居酒屋", "あゆ"]
    }
  }
}

match, multi match

multi matchは複数のフィールドのマッチする。

GET ldgourmet/restaurant/_search
{
  "query": {
    "match" : {
      "name" : "居酒屋 あゆ"  
    }
  }
}

# 語句の条件を指定. andだとすべての語句を含まなければいけない
GET ldgourmet/restaurant/_search
{
  "query": {
    "match" : {
      "name" : {
        "query": "居酒屋 あゆ",
        "operator": "and"
      }
    }
  }
}

# あいまい検索
GET ldgourmet/restaurant/_search
{
  "query": {
    "match" : {
      "name" : {
        "query": "居酒屋 あゆ",
        "operator": "and",
        "fuzziness": 2,
        "prefix_length" : 1
      }
    }
  }
}

# 複数のフィールドにマッチしたものを探す。ただし、nameの重みを2倍にする
GET ldgourmet/restaurant/_search
{
  "query": {
    "multi_match" : {
      "query" : "居酒屋 あゆ",
      "fields": ["name^2", "description"]
    }
  }
}

参考

Common Terms

stopwordsを考慮した検索

Query String

Apache Luceneのクエリシンタックスを使って検索できる。

Simple Query String

以下のシンタックスを使って検索する

+: and
|: or
-: 含まない
": 語句をフレーズとして扱う
(): グループとして扱う

GET ldgourmet/restaurant/_search
{
  "query": {
    "simple_query_string" : {
        "query": "居酒屋 +(あゆ) -花亭",
        "fields": ["name"],
        "default_operator": "and"
    }
  }
}

Bool Query

must/filter/should/must_notを使ってクエリをフィルタしていく

must: 必ず語句がドキュメントにマッチしていなければならず、スコアに影響する
filter: 必ず語句がドキュメントにマッチしていなければならない。ただし、スコアには影響しない
should: 指定された語句があり、must/filterがなければ、かならず1つ以上の語句がドキュメントにマッチしなければならない。最小のマッチ数はminimum_should_matchで指定できる
must_not: 語句がドキュメントに含まれない

Range

範囲を指定して検索する。

gte: 〜以上
gt: 〜より大きい
lte: 〜以下
lt: 〜より小さい
boost: ブーストの値を設定する

# ファン数が10-20のレストランを探す
GET ldgourmet/restaurant/_search
{
  "query": {
    "range" : {
      "fan_count" : {
        "gte" : 10,
        "lte" : 20
      }
    }
  }
}

Exists

値があるフィールドのみを返す。値がない場合は以下。

null
[]
[null]
フィールド自体がない場合

Prefix

特定のprefixではじまる語句を検索する

Wildcard

*と?を使ってワイルドカード検索する

Regex

正規表現で検索する

Fuzzy

Fuzzy検索をする

Ids

指定のIDにマッチしたクエリを返す

GET ldgourmet/restaurant/_search
{
  "query": {
    "ids" : { "values" : ["25030", "25031", "25032"] }
  }
}

Lite版のサーチ

クエリパラメータのみで検索ができる

GET ldgourmet/restaurant/_search?q=name:蒙古

Suggesters

もしかして検索などで使うもの。
（注意: 以下は別のindexに対してサーチした。該当のindexのnameフィールドはnot_analyzed設定。）

クエリ

GET restaurants_development/_suggest
{
  "my-suggestion": {
    "text" : "すきやばし小次郎",
    "term": {
      "size" : 5,
      "field": "name"
    }
  }
}

結果

{
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "my-suggestion": [
    {
      "text": "すきやばし小次郎",
      "offset": 0,
      "length": 8,
      "options": [
        {
          "text": "すきやばし 次郎",
          "score": 0.875,
          "freq": 1
        },
        {
          "text": "すきやばし次郎",
          "score": 0.85714287,
          "freq": 3
        }
      ]
    }
  ]
}

ソート

# 居酒屋でヒットするものをPV順にソートする
GET ldgourmet/restaurant/_search
{
  "query": {
    "term": { "name": "居酒屋" }
  },
  "sort": {
    "access_count": "desc"
  }
}

Mapping

マッピングの内容がみれる

{
  "ldgourmet": {
    "mappings": {
      "restaurant": {
        "properties": {
          "access_count": {
            "type": "integer"
          },
          "address": {
            "type": "string",
            "analyzer": "ngram_analyzer"
          },
          "alphabet": {
            "type": "string"
          },
          "area_id": {
            "type": "string"
          },
...

Index

Settings

Indexの設定値が取得できる

クエリ

GET ldgourmet/_settings

結果

{
  "ldgourmet": {
    "settings": {
      "index": {
        "creation_date": "1471317478939",
        "analysis": {
          "analyzer": {
            "ngram_analyzer": {
              "tokenizer": "ngram_tokenizer"
            }
          },
          "tokenizer": {
            "ngram_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "2",
              "type": "nGram",
              "max_gram": "3"
            }
          }
        },
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "a83qEcqbTl2M82ri7t7gaw",
        "version": {
          "created": "2030599"
        }
      }
    }
  }
}

Mapping

mappingの情報を取得する

GET ldgourmet/restaurant/_mapping

{
  "ldgourmet": {
    "mappings": {
      "restaurant": {
        "properties": {
          "access_count": {
            "type": "integer"
          },
          "address": {
            "type": "string",
            "analyzer": "ngram_analyzer"
          },
          "alphabet": {
            "type": "string"
          },
          "area_id": {
            "type": "string"
          },
          "category": {
            "type": "string",
            "analyzer": "whitespace"
          },
          "category_id1": {
            "type": "string"
          },
          "category_id2": {
            "type": "string"
          },
          "category_id3": {
            "type": "string"
          },
          "category_id4": {
            "type": "string"
          },
          "category_id5": {
            "type": "string"
          },
          "closed": {
            "type": "boolean"
          },
          "created_on": {
            "type": "string"
          },
          "description": {
            "type": "string",
            "analyzer": "ngram_analyzer"
          },
...

Analyze

アナライザーを指定して試す事ができる。Inquisitorプラグインを使えば同じようなことがもっと効率的には出来る。

GET _analyze
{
  "analyzer" : "standard",
  "text" : "this is a test"
}

GET _analyze
{
  "tokenizer" : "keyword",
  "token_filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}

# explainをtrueにするとより詳しい結果が見れる。各フィルタごとの結果もわかる。
GET _analyze
{
  "tokenizer" : "keyword",
  "explain": true,
  "token_filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}

Stats

Indexの統計情報が取れる

GET ldgourmet/_stats

Delete Index

# 作成
PUT twitter
{
  "settings" : {
    "index" : {
      "number_of_shards" : 3, 
      "number_of_replicas" : 2 
    }
  }
}

# 削除
DELETE twitter

Update Indices Settings

Indexの設定を更新するには最初にindexをクローズする必要がある

# indexを閉じる
POST twitter/_close

# indexを変更する
PUT twitter/_settings
{
  "analysis" : {
    "analyzer":{
      "content":{
        "type":"custom",
        "tokenizer":"whitespace"
      }
    }
  }
}

# indexを開く
POST twitter/_open

Explain

ドキュメントがどのようにマッチしたのか詳細に表示してくれる。

クエリ

GET ldgourmet/restaurant/326659/_explain
{
  "query": {
    "term": { 
      "name": "居酒屋"
    }
  }
}

結果

{
  "_index": "ldgourmet",
  "_type": "restaurant",
  "_id": "326659",
  "matched": true,
  "explanation": {
    "value": 2.787416,
    "description": "sum of:",
    "details": [
      {
        "value": 2.787416,
        "description": "weight(name:居酒屋 in 11156) [PerFieldSimilarity], result of:",
        "details": [
          {
            "value": 2.787416,
            "description": "fieldWeight in 11156, product of:",
            "details": [
              {
                "value": 1,
                "description": "tf(freq=1.0), with freq of:",
                "details": [
                  {
                    "value": 1,
                    "description": "termFreq=1.0",
                    "details": []
                  }
                ]
              },
              {
                "value": 5.574832,
                "description": "idf(docFreq=851, maxDocs=82654)",
                "details": []
              },
              {
                "value": 0.5,
                "description": "fieldNorm(doc=11156)",
                "details": []
              }
            ]
          }
        ]
      },
      {
        "value": 0,
        "description": "match on required clause, product of:",
        "details": [
          {
            "value": 0,
            "description": "# clause",
            "details": []
          },
          {
            "value": 0.1793776,
            "description": "_type:restaurant, product of:",
            "details": [
              {
                "value": 1,
                "description": "boost",
                "details": []
              },
              {
                "value": 0.1793776,
                "description": "queryNorm",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

Index Templates

index templateを使うと、mappingsやsettingsのテンプレートを作っておけ、マッチする文字列のindexが作成されたときに自動的にそのmappingsやsettingsが適用できます。
なお、複数のテンプレートが同時にマッチした場合は、orderで適用順序を制御できます。orderは小さい数字のほうが先に適用されます。

# テンプレート一覧取得
GET _template/

cat APIs

aliases

indexに対応付けられたaliasを取得する

クエリ

GET _cat/aliases

結果。以下はsearchkickを使ったために作られたalias

alias                   index                                     filter routing.index routing.search 
restaurants_development restaurants_development_20160821161429131 -      -             -              
areas_development       areas_development_20160821135301707       -      -             -              
prefs_development       prefs_development_20160821133713704       -      -             -              
ratings_development     ratings_development_20160821160422911     -      -             -              
articles_development    articles_development_20160819111717137    -      -             -

その他

一部Cluster APIを同じ情報を返すものもある

GET _cat/allocation?v: それぞれのシャードがどのくらいのdiskを使っているか　
GET _cat/indices?v: 各インデックスのドキュメントのが詳細に見れる
GET _cat/fielddata?v: 各ノードでどれくらいのヒープメモリが使われているか
GET _cat/health?v: クラスターが正しく動いているかの健康状態を表示
GET _cat/nodes?v: 各ノードの状態を表示　
GET _cat/pending_tasks?v: いま実行されてたり待たされているジョブを表示
GET _cat/plugins?v: インストールしているプラグインを表示
GET _cat/shards?v: シャードの情報を表示。インデックスごとのドキュメントの量と使用量がわかる。
GET _cat/segments?v: セグメントの情報がわかる

Validate API

詳しく知りたい場合はValidate APIの利用が参考になります。

以下クエリと実行した結果です。ただ、この場合は、

GET ldgourmet/_validate/query?explain
{
  "query": {
    "match_al": {}
  }
}

結果

{
  "valid": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "ldgourmet",
      "valid": false,
      "error": "org.elasticsearch.index.query.QueryParsingException: No query registered for [match_al]"
    }
  ]
}

また、validate APIは検索がどのような単語に変換されて行われたかもわかる

クエリ

GET ldgourmet/_validate/query?explain
{
  "query": {
    "multi_match": {
      "fields": ["name","name_kana","description"],
      "query": "すきやばし"
    }
  }
}

結果

{
  "valid": true,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "explanations": [
    {
      "index": "ldgourmet",
      "valid": true,
      "explanation": "((name:すき name:すきや name:きや name:きやば name:やば name:やばし name:ばし) | (description:すき description:すきや description:きや description:きやば description:やば description:やばし description:ばし) | (name_kana:すき name_kana:すきや name_kana:きや name_kana:きやば name_kana:やば name_kana:やばし name_kana:ばし))"
    }
  ]
}

Cluster API

Node Stats

nodeの統計情報を詳しく見れる。リクエストのqueueが溜まっているとかも見れる

GET _nodes/stats

応用

ここからはニーズのありそうなクエリを組み立てていきます。

蒙古タンメン中本が何件あるか調べる

時々蒙古タンメンをためだくなるのですが、中本は何件あるのでしょうか。調べてみましょう。
結果は10件でした。default_operatorをandにしないと中本以外も他のタンメン屋さんがヒットしてしまいます。

GET ldgourmet/restaurant/_search
{
  "query": {
    "simple_query_string" : {
        "query": "蒙古タンメン",
        "fields": ["name"],
        "default_operator": "and"
    }
  }
}

さて、食べたくなったので中目黒にあるか調べてみます。

GET ldgourmet/restaurant/_search
{
  "query": {
    "simple_query_string" : {
        "query": "蒙古タンメン 中目黒",
        "fields": ["name", "address"],
        "default_operator": "and"
    }
  }
}

ないですね。。。では中目黒に近い順に並べてみましょう。こちらを参考に組み立ててみます。まず中目黒駅の緯度経度を調べます。Googleマップで緯度・経度を求めるというサイトがあったので、やってみると(緯度(latitude), 経度(longitude)) = (35.644288, 139.699096)と出ました。

GET ldgourmet/restaurant/_search
{
  "query": {
    "simple_query_string" : {
        "query": "蒙古タンメン",
        "fields": ["name"],
        "default_operator": "and"
    }
  },
  "sort": [{
    "_geo_distance": {
      "location": {
        "lat": 35.644288,
        "lon": 139.699096
      },
      "order": "asc",
      "unit": "km",
      "distance_type": "plane"
    } 
  }]
}

結果は以下です。渋谷が一番近いようです。

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": null,
    "hits": [
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "407978",
        "_score": null,
        "fields": {
          "name": [
            "蒙古タンメン中本"
          ],
          "address": [
            "渋谷区道玄坂2-6-17渋東シネタワー　B2"
          ]
        },
        "sort": [
          1.3264480838396593
        ]
      },
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "16628",
        "_score": null,
        "fields": {
          "name": [
            "蒙古タンメン中本"
          ],
          "address": [
            "品川区上大崎2-13-45トランスリンク第3ビル1階"
          ]
        },
        "sort": [
          2.7366820396931413
        ]
      },
...

さらにKibanaで視覚的に中本がどこにあるのか確かめてみます。Kinabaについての使い方を知りたい場合はKPIダッシュボードとして使うKibana4がいいかなと思います。name:"蒙古タンメン中本"でしぼると以下のようになります。

データセット自体は若干古いですが、横浜にもあるのですね。

ランチをしに中目黒から徒歩でいけるラーメン屋を探す

検索条件は以下です

ランチがやっている: open_lunch=1
徒歩で行ける: 中目黒駅から0.8km以内参考
ラーメン屋: nameまたはdescriptionにラーメンが含まれる

クエリ

GET ldgourmet/restaurant/_search
{
  "size": 100,
  "fields": ["name", "address", "description"],
  "query": {
    "filtered": {
      "query": {
        "simple_query_string" : {
          "query": "ラーメン",
          "fields": ["name", "description"],
          "default_operator": "and"
        }
      },
      "filter": {
        "geo_distance": {
          "distance":      "0.8km",
          "distance_type": "plane", 
          "location": {
            "lat": 35.644288,
            "lon": 139.699096
          }
        }
      }
    }
  }
}

結果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 8,
    "max_score": 2.0545835,
    "hits": [
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "14956",
        "_score": 2.0545835,
        "fields": {
          "name": [
            "ラーメン香月"
          ],
          "address": [
            "目黒区青葉台１-２９-１２セブンスターマンション第二青葉台1階"
          ]
        }
      },
      {
        "_index": "ldgourmet",
        "_type": "restaurant",
        "_id": "309908",
        "_score": 1.6692734,
        "fields": {
          "name": [
            "屋台系ラーメン 魚鳥"
          ],
          "description": [
            "山手通り沿い。"
          ],
          "address": [
            "目黒区東山2-2-5"
          ]
        }
      },
      {
        "_index": "ldgourmet",
...

さてこの場合、総合的に、人気があるかとか、語句のラーメンへの関連度が高いとか、距離とかはは考慮されていません。その場合自分でFunciton Scoreを定義してスコアを以下のように独自に決めれます。

GET ldgourmet/restaurant/_search
{
  "size": 100,
  "fields": ["name", "access_count", "address", "description"],
  "query": {
    "filtered": {
      "query": {
        "function_score": {
          "query": {
            "simple_query_string" : {
              "query": "ラーメン",
              "fields": ["name", "description"],
              "default_operator": "and"
            }
          },
          "boost_mode": "multiply",
          "script_score": {
            "script": "_score * doc['access_count'].value"
          }
        }
      },
      "filter": {
        "geo_distance": {
          "distance":      "0.8km",
          "distance_type": "plane", 
          "location": {
            "lat": 35.644288,
            "lon": 139.699096
          }
        }
      }
    }
  }
}

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up