Posted at

Elsaticsarch 検索クエリの検証API

More than 1 year has passed since last update.

Elasticsearchの検索クエリの調査・改善をするのに知っていると便利なAPIについてまとめました。


Validate API

検索クエリを実行せずにAPIに問題がないかを確認できます。

値が返ってきていて正常終了しているクエリでも内部ではエラーとなっていることもあるようです。

大規模データの分析用クエリのデバッグ時など実行すると高負荷がかかってしまう場合などにも便利です。

通常使用している検索URIの _search の部分を _validate/query にして実行するだけです。


リクエスト

GET /_validate/query

{ --- クエリ --- }


結果|成功の場合

{

"valid": true,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
}
}


結果|問題がある場合

{

"valid": false
}

validがtrueならば正常に実行できることがわかります。

また explain パラメータを付加すると成功失敗に関わらず詳細内容を返してくれます。


リクエスト

GET /geo/_validate/query?explain


{

"valid": true,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"explanations": [
{
"index": ".kibana",
"valid": true,
"explanation": "like:[Once upon a time]"
},
{
"index": ".marvel",
"valid": true,
"explanation": "like:[Once upon a time]"
},
  ・・・ 省略 ・・・


結果|失敗の場合

{

"valid": false,
"error": "org.elasticsearch.common.ParsingException: request does not support [sort]"
}

複数shardある場合、通常ランダムで1shardでしか検索が実行されないので、どこか別のshardでのみ不具合が起こっている場合検知ができません。all_shardsパラメータをつけることで全shardに向けクエリを実行させることができます。


リクエスト|全shardでクエリを実行する

GET /_validate/query?all_shards



結果

{

"valid": true,
"_shards": {
"total": 46,
"successful": 46,
"failed": 0
},
"explanations": [
{
"index": ".kibana",
"shard": 0,
"valid": true,
"explanation": "like:[Once upon a time]"
},
{
"index": ".kibana",
"shard": 1,
"valid": true,
"explanation": "like:[Once upon a time]"
},
{
"index": ".kibana",
"shard": 2,
"valid": true,
"explanation": "like:[Once upon a time]"
},
  ・・・ 省略 ・・・


Explain API

検索クエリと特定のドキュメントを指定するとスコアリングの内訳を返してくれます。

スコアリングクエリのチューニングには最適です。


リクエスト

GET logstash-2015.05.18/log/FXgyUWABFywrhlsJO2sZ/_explain

{
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "2015-05-18T08:52:41.823Z"
}
}
},
{
"term": {
"response": {
"value": "200"
}
}
}
]
}
}
}



結果

{

"_index": "logstash-2015.05.18",
"_type": "log",
"_id": "FXgyUWABFywrhlsJO2sZ",
"matched": true,
"explanation": {
"value": 1.091051,
"description": "sum of:",
"details": [
{
"value": 1,
"description": "@timestamp:[1431939161823 TO 9223372036854775807]",
"details": []
},
{
"value": 0.09105097,
"description": "weight(response:200 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.09105097,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.09105097,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 823,
"description": "docFreq",
"details": []
},
{
"value": 901,
"description": "docCount",
"details": []
}
]
},
{
"value": 1,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1,
"description": "avgFieldLength",
"details": []
},
{
"value": 1,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
}
}

それぞれの詳細内容毎のスコアと計算式まで表示されていてわかりやすいですね。


Profile API

最後にProfile APIです。

こちらは検索を実行した際にクエリを処理していく各タイミングでの経過時間などの情報を返してくれます。

リクエストの仕方はクエリ内に "profile": true, を追記するだけです。


リクエスト

GET logstash-2015.05.18/_search

{
"profile": true,
"query": { --- クエリ --- }
}


結果

{

"": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.5134604,
"hits": [
{
"_index": "logstash-2015.05.18",
"_type": "log",
"_id": "GXgyUWABFywrhlsJO2sZ",
"_score": 2.5134604,
"_source": {
"@timestamp": "2015-05-18T15:57:27.541Z",
"ip": "225.44.217.191",
"extension": "jpg",
"response": "200",
"geo": {
"coordinates": {
"lat": 38.53146222,
"lon": -121.7864906
},
"src": "ID",
"dest": "IN",
"srcdest": "ID:IN"
},
"@tags": [
"success",
"info"
],
"utc_time": "2015-05-18T15:57:27.541Z",
"referer": "http://twitter.com/success/ted-freeman",
"agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
"clientip": "225.44.217.191",
"bytes": 2559,
"host": "media-for-the-masses.theacademyofperformingartsandscience.org",
"request": "/uploads/charles-fullerton.jpg",
"url": "https://media-for-the-masses.theacademyofperformingartsandscience.org/uploads/charles-fullerton.jpg",
"@message": """225.44.217.191 - - [2015-05-18T15:57:27.541Z] "GET /uploads/charles-fullerton.jpg HTTP/1.1" 200 2559 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"""",
"spaces": "this is a thing with lots of spaces wwwwoooooo",
"xss": """<script>console.log("xss")</script>""",
"headings": [
"<h3>joe-h-engle</h5>",
"http://www.slate.com/success/robert-s-kimbrough"
],
"links": [
"michael-massimino@www.slate.com",
"http://twitter.com/security/patrick-forrester",
"www.twitter.com"
],
"relatedContent": [
{
"url": "http://www.laweekly.com/music/jay-electronica-much-better-than-his-name-would-suggest-2412364",
"og:type": "article",
"og:title": "Jay Electronica: Much Better Than His Name Would Suggest",
"og:description": "You may not know who Jay Electronica is yet, but I&#039;m willing to bet that you would had he chosen a better name. Jay Electronica does not sound like the ...",
"og:url": "http://www.laweekly.com/music/jay-electronica-much-better-than-his-name-would-suggest-2412364",
"article:published_time": "2008-04-04T16:00:00-07:00",
"article:modified_time": "2014-11-27T08:01:03-08:00",
"article:section": "Music",
"og:site_name": "LA Weekly",
"twitter:title": "Jay Electronica: Much Better Than His Name Would Suggest",
"twitter:description": "You may not know who Jay Electronica is yet, but I&#039;m willing to bet that you would had he chosen a better name. Jay Electronica does not sound like the ...",
"twitter:card": "summary",
"twitter:site": "@laweekly"
}
],
"machine": {
"os": "win 7",
"ram": 17179869184
},
"@version": "1"
}
},
{
"_index": "logstash-2015.05.18",
"_type": "log",
"_id": "FngyUWABFywrhlsJO34h",
"_score": 2.5134604,
"_source": {
"@timestamp": "2015-05-18T11:40:24.100Z",
"ip": "202.243.166.195",
"extension": "jpg",
"response": "200",
"geo": {
"coordinates": {
"lat": 38.99017472,
"lon": -122.8997175
},
"src": "BD",
"dest": "BD",
"srcdest": "BD:BD"
},
"@tags": [
"success",
"info"
],
"utc_time": "2015-05-18T11:40:24.100Z",
"referer": "http://www.slate.com/success/donald-deke-slayton",
"agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24",
"clientip": "202.243.166.195",
"bytes": 2559,
"host": "media-for-the-masses.theacademyofperformingartsandscience.org",
"request": "/uploads/mark-polansky.jpg",
"url": "https://media-for-the-masses.theacademyofperformingartsandscience.org/uploads/mark-polansky.jpg",
"@message": """202.243.166.195 - - [2015-05-18T11:40:24.100Z] "GET /uploads/mark-polansky.jpg HTTP/1.1" 200 2559 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24"""",
"spaces": "this is a thing with lots of spaces wwwwoooooo",
"xss": """<script>console.log("xss")</script>""",
"headings": [
"<h3>richard-linnehan</h5>",
"http://twitter.com/warning/guy-gardner"
],
"links": [
"robert-satcher@twitter.com",
"http://www.slate.com/info/valentina-tereshkova",
"www.www.slate.com"
],
"relatedContent": [],
"machine": {
"os": "win xp",
"ram": 10737418240
},
"@version": "1"
}
}
]
},
"profile": {
"shards": [
{
"id": "[4x5gS5I3QgOjn6fWrFsfsA][logstash-2015.05.18][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+@timestamp:[1431939161823 TO 9223372036854775807] +response:200 +extension:jpg +bytes:[2559 TO 2559]",
"time_in_nanos": 534278,
"breakdown": {
"score": 3288,
"build_scorer_count": 4,
"match_count": 2,
"create_weight": 61361,
"next_doc": 68268,
"match": 1323,
"create_weight_count": 1,
"next_doc_count": 4,
"score_count": 2,
"build_scorer": 400025,
"advance": 0,
"advance_count": 0
},
"children": [
{
"type": "IndexOrDocValuesQuery",
"description": "@timestamp:[1431939161823 TO 9223372036854775807]",
"time_in_nanos": 114173,
"breakdown": {
"score": 405,
"build_scorer_count": 8,
"match_count": 2,
"create_weight": 4494,
"next_doc": 0,
"match": 821,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 2,
"build_scorer": 108014,
"advance": 422,
"advance_count": 4
}
},
{
"type": "TermQuery",
"description": "response:200",
"time_in_nanos": 60101,
"breakdown": {
"score": 1082,
"build_scorer_count": 8,
"match_count": 0,
"create_weight": 20625,
"next_doc": 0,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 2,
"build_scorer": 20752,
"advance": 17627,
"advance_count": 4
}
},
{
"type": "TermQuery",
"description": "extension:jpg",
"time_in_nanos": 31650,
"breakdown": {
"score": 278,
"build_scorer_count": 8,
"match_count": 0,
"create_weight": 12306,
"next_doc": 0,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 2,
"build_scorer": 7651,
"advance": 11400,
"advance_count": 4
}
},
{
"type": "PointRangeQuery",
"description": "bytes:[2559 TO 2559]",
"time_in_nanos": 113261,
"breakdown": {
"score": 190,
"build_scorer_count": 8,
"match_count": 0,
"create_weight": 733,
"next_doc": 1003,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 4,
"score_count": 2,
"build_scorer": 111320,
"advance": 0,
"advance_count": 0
}
}
]
}
],
"rewrite_time": 2650,
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time_in_nanos": 15895,
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": 7791
}
]
}
]
}
],
"aggregations": []
},
  --- 省略 ---  

どのAPIもパラメータが多くそれぞれ理解するだけでも大変ですが、クエリチューニングの際には確認しておきたいですね。


参考URL

+Validation API https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html

+Explain API https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html

+Profile API https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html