More than 5 years have passed since last update.

ElasticsearchとSQL対比しながら理解

Last updated at 2018-12-24Posted at 2016-05-05

本稿執筆時は、まだES2.3でしたが

本質は変わっていません
6系でも通用する話。

For what

Remind search API in elasticsearch

公式ドキュメントを読みながら、まとめていく。

the search API

pretty parameter

The pretty parameter, again, just tells Elasticsearch to return pretty-printed JSON results.

REST request URI

curl 'localhost:9200/bank/_search?q=*&pretty'

REST request body

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} }
}'

response

took – time in milliseconds for Elasticsearch to execute the search
timed_out – tells us if the search timed out or not
_shards – tells us how many shards were searched, as well as a count of the successful/failed searched shards

hits – search results
hits.total – total number of documents matching our search criteria
hits.hits – actual array of search results (defaults to first 10 documents)
_score and max_score - ignore these fields for now

カラム指定検索みたいな

elasticsearch

 
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}'

The match_all query is simply a search for all documents in the specified index.
とあるが、実際には上記の通り先頭10件のみ出力される

SQLで言うところのLIMIT句的な

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "from": 10, # 11番目から
  "size": 10  # 20番目まで（件数指定）
}

The from parameter (0-based) デフォルトはZEROスタート

検索結果件数だけを見たい場合は、size:0にする

size=0 to not show search hits because we only want to see the aggregation results in the response.

ソート

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } }
}'

account balanceで降順ソート

SQLで言うところのSELECT

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "_source": ["account_number", "balance"]
}'

_source field in the search hits
ということ。

SQLで言うところのWHERE的な

###match query

単一条件指定

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "account_number": 20 } }
}'

LIKE的な

returns all accounts containing the term "mill" in the address:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "address": "mill" } }
}'

複数LIKE的な

returns all accounts containing the term "mill" or "lane" in the address:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match": { "address": "mill lane" } } # mill or laneを含むもの 
}'

特定のフレーズを含む検索

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_phrase": { "address": "mill lane" } }
}'

boo(lean) query
returns all accounts containing "mill" and "lane" in the address:

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}'

※and条件みたいなもの"mill" and "lane"
※反対がmust_not

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
}'

※or条件みたいなもの "mill" or "lane"

bool句での複合条件指定

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}'

returns all accounts of anybody who is 40 years old but don’t live in ID(aho):

score field

というものが、responseにあります。

a numeric value that is a relative measure of how well the document matches the search query that we specified

デフォルトのクエリでは検索クエリとのマッチ度を計算して、score フィールドに返している

Filters

In cases where we do not need the relevance scores, Elasticsearch provides another query capability in the form of filters.

メリット

スコアリング（検索クエリとのマッチ度計算）しないため、高速
キャッシュが効く

Filters do not score so they are faster to execute than queries
Filters can be cached in memory allowing repeated search executions to be significantly faster than queries

複合条件指定ができる

the filtered query allows you to combine a query (like match_all, match, bool, etc.) together with a filter

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "filtered": {
      "query": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}'

query -> SELECT
filter -> where

みたいなとこ。

aggregations(集約)

合計

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state"
      }
    }
  }
}'

sorted by count descending (also default): デフォルトは降順

SQLだと、
SELECT state, COUNT(*) FROM bank GROUP BY state ORDER BY COUNT(*) DESC

みたいな感じとのこと。

平均

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}'

Common option

REST APIで下記が使用可能

?pretty=true # jsonでretrunされる（が、基本デバっグでのみ使うべし！）
?format=yaml # yamlでreturnされる
?human=false # 人間でもわかりやすいreturnをoffる（デフォルトはfalse)

response filtering

curl -XGET 'localhost:9200/_search?pretty&filter_path=took,hits.hits._id,hits.hits._score' # 返り値を絞りこむ
{
  "took" : 3,
  "hits" : {
    "hits" : [
      {
        "_id" : "3640",
        "_score" : 1.0
      },
      {
        "_id" : "3642",
        "_score" : 1.0
      }
    ]
  }
}

もちろん、ワイルドカードもあり

curl 'localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
  "indices" : {
    "movies" : {
      "shards" : {
        "0" : [ {
          "segments" : {
            "_0" : {
              "version" : "5.2.0"
            }
          }
        } ],
        "2" : [ {
          "segments" : {
            "_0" : {
              "version" : "5.2.0"
            }
          }
        } ]
      }
    },
    "books" : {
      "shards" : {
        "0" : [ {
          "segments" : {
            "_0" : {
              "version" : "5.2.0"
            }
          }
        } ]
      }
    }
  }
}

_source

elasricsearchはrawデータを返す場合もありのことで、下記のようにして、必要最低限のソースのみに絞り込むとよい

curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
  "hits" : {
    "hits" : [ {
      "_source":{"title":"Book #2"}
    }, {
      "_source":{"title":"Book #1"}
    }, {
      "_source":{"title":"Book #3"}
    } ]
  }
}

index API

特定のindexにドキュメント（新規データ）を追加する

ref

mapping in elasticsearch

ES5.4で試したかったら..with Docker

ml
version: '3'                                                                                                                                                  
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:5.4.1
    environment:
      # - bootstrap.memory_lock=true
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - xpack.watcher.enabled=false
      - xpack.graph.enabled=false
      - xpack.monitoring.history.duration=1d
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    container_name: elasticsearch
    networks:
      - elastic
    ports:
      - "9200:9200"
      - "9300:9300"
  kibana:
    image: docker.elastic.co/kibana/kibana:5.4.1
    environment:
      - xpack.reporting.enabled=false
      - xpack.monitoring.enabled=false
    container_name: kibana
    networks:
      - elastic
    links:
      - elasticsearch
    ports:
      - "5601:5601"
networks:
  elastic:

153

139

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up