本稿執筆時は、まだES2.3でしたが
本質は変わっていません
6系でも通用する話。
For what
Remind search API in elasticsearch
公式ドキュメントを読みながら、まとめていく。
the search API
pretty parameter
The pretty parameter, again, just tells Elasticsearch to return pretty-printed JSON results.
REST request URI
curl 'localhost:9200/bank/_search?q=*&pretty'
REST request body
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} }
}'
response
-
took
– time in milliseconds for Elasticsearch to execute the search -
timed_out
– tells us if the search timed out or not -
_shards
– tells us how many shards were searched, as well as a count of the successful/failed searched shards
-
hits
– search results -
hits.total
– total number of documents matching our search criteria -
hits.hits
– actual array of search results (defaults to first 10 documents) -
_score
andmax_score
- ignore these fields for now
カラム指定検索みたいな
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}'
The match_all query is simply a search for all documents in the specified index.
とあるが、実際には上記の通り先頭10件のみ出力される
SQLで言うところのLIMIT句的な
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"from": 10, # 11番目から
"size": 10 # 20番目まで(件数指定)
}
The from parameter (0-based)
デフォルトはZEROスタート
- 検索結果件数だけを見たい場合は、
size:0
にする
size=0 to not show search hits because we only want to see the aggregation results in the response.
ソート
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}'
account balanceで降順ソート
SQLで言うところのSELECT
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}'
_source field in the search hits
ということ。
SQLで言うところのWHERE的な
###match query
- 単一条件指定
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": { "account_number": 20 } }
}'
- LIKE的な
returns all accounts containing the term "mill" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": { "address": "mill" } }
}'
- 複数LIKE的な
returns all accounts containing the term "mill" or "lane" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match": { "address": "mill lane" } } # mill or laneを含むもの
}'
- 特定のフレーズを含む検索
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match_phrase": { "address": "mill lane" } }
}'
- boo(lean) query
returns all accounts containing "mill" and "lane" in the address:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}'
※and条件みたいなもの"mill" and "lane"
※反対がmust_not
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}'
※or条件みたいなもの "mill" or "lane"
-
bool
句での複合条件指定
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}'
returns all accounts of anybody who is 40 years old but don’t live in ID(aho):
score field
というものが、responseにあります。
a numeric value that is a relative measure of how well the document matches the search query that we specified
デフォルトのクエリでは検索クエリとのマッチ度を計算して、score フィールドに返している
Filters
In cases where we do not need the relevance scores, Elasticsearch provides another query capability in the form of filters.
メリット
- スコアリング(検索クエリとのマッチ度計算)しないため、高速
- キャッシュが効く
Filters do not score so they are faster to execute than queries
Filters can be cached in memory allowing repeated search executions to be significantly faster than queries
複合条件指定ができる
the filtered query allows you to combine a query (like match_all, match, bool, etc.) together with a filter
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}'
- query -> SELECT
- filter -> where
みたいなとこ。
aggregations(集約)
合計
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state"
}
}
}
}'
sorted by count descending (also default):
デフォルトは降順
SQLだと、
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC
みたいな感じとのこと。
平均
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}'
Common option
REST APIで下記が使用可能
-
?pretty=true
# jsonでretrunされる(が、基本デバっグでのみ使うべし!) -
?format=yaml
# yamlでreturnされる -
?human=false
# 人間でもわかりやすいreturnをoffる(デフォルトはfalse)
response filtering
curl -XGET 'localhost:9200/_search?pretty&filter_path=took,hits.hits._id,hits.hits._score' # 返り値を絞りこむ
{
"took" : 3,
"hits" : {
"hits" : [
{
"_id" : "3640",
"_score" : 1.0
},
{
"_id" : "3642",
"_score" : 1.0
}
]
}
}
もちろん、ワイルドカードもあり
curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
"indices" : {
"movies" : {
"shards" : {
"0" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
}
}
} ],
"2" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
}
}
} ]
}
},
"books" : {
"shards" : {
"0" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
}
}
} ]
}
}
}
}
_source
elasricsearchはrawデータを返す場合もありのことで、下記のようにして、必要最低限のソースのみに絞り込むとよい
curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
"hits" : {
"hits" : [ {
"_source":{"title":"Book #2"}
}, {
"_source":{"title":"Book #1"}
}, {
"_source":{"title":"Book #3"}
} ]
}
}
index API
特定のindexにドキュメント(新規データ)を追加する
ref
ES5.4で試したかったら..with Docker
ml
version: '3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
environment:
# - bootstrap.memory_lock=true
- xpack.security.enabled=false
- xpack.monitoring.enabled=false
- xpack.watcher.enabled=false
- xpack.graph.enabled=false
- xpack.monitoring.history.duration=1d
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
container_name: elasticsearch
networks:
- elastic
ports:
- "9200:9200"
- "9300:9300"
kibana:
image: docker.elastic.co/kibana/kibana:5.4.1
environment:
- xpack.reporting.enabled=false
- xpack.monitoring.enabled=false
container_name: kibana
networks:
- elastic
links:
- elasticsearch
ports:
- "5601:5601"
networks:
elastic: