Elasticsearchの検索クエリの調査・改善をするのに知っていると便利なAPIについてまとめました。
Validate API
検索クエリを実行せずにAPIに問題がないかを確認できます。
値が返ってきていて正常終了しているクエリでも内部ではエラーとなっていることもあるようです。
大規模データの分析用クエリのデバッグ時など実行すると高負荷がかかってしまう場合などにも便利です。
通常使用している検索URIの _search
の部分を _validate/query
にして実行するだけです。
GET /_validate/query
{ --- クエリ --- }
{
"valid": true,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
}
}
{
"valid": false
}
validがtrueならば正常に実行できることがわかります。
また explain
パラメータを付加すると成功失敗に関わらず詳細内容を返してくれます。
GET /geo/_validate/query?explain
{
"valid": true,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"explanations": [
{
"index": ".kibana",
"valid": true,
"explanation": "like:[Once upon a time]"
},
{
"index": ".marvel",
"valid": true,
"explanation": "like:[Once upon a time]"
},
・・・ 省略 ・・・
{
"valid": false,
"error": "org.elasticsearch.common.ParsingException: request does not support [sort]"
}
複数shardある場合、通常ランダムで1shardでしか検索が実行されないので、どこか別のshardでのみ不具合が起こっている場合検知ができません。all_shardsパラメータをつけることで全shardに向けクエリを実行させることができます。
GET /_validate/query?all_shards
{
"valid": true,
"_shards": {
"total": 46,
"successful": 46,
"failed": 0
},
"explanations": [
{
"index": ".kibana",
"shard": 0,
"valid": true,
"explanation": "like:[Once upon a time]"
},
{
"index": ".kibana",
"shard": 1,
"valid": true,
"explanation": "like:[Once upon a time]"
},
{
"index": ".kibana",
"shard": 2,
"valid": true,
"explanation": "like:[Once upon a time]"
},
・・・ 省略 ・・・
Explain API
検索クエリと特定のドキュメントを指定するとスコアリングの内訳を返してくれます。
スコアリングクエリのチューニングには最適です。
GET logstash-2015.05.18/log/FXgyUWABFywrhlsJO2sZ/_explain
{
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "2015-05-18T08:52:41.823Z"
}
}
},
{
"term": {
"response": {
"value": "200"
}
}
}
]
}
}
}
{
"_index": "logstash-2015.05.18",
"_type": "log",
"_id": "FXgyUWABFywrhlsJO2sZ",
"matched": true,
"explanation": {
"value": 1.091051,
"description": "sum of:",
"details": [
{
"value": 1,
"description": "@timestamp:[1431939161823 TO 9223372036854775807]",
"details": []
},
{
"value": 0.09105097,
"description": "weight(response:200 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.09105097,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.09105097,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 823,
"description": "docFreq",
"details": []
},
{
"value": 901,
"description": "docCount",
"details": []
}
]
},
{
"value": 1,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 1,
"description": "avgFieldLength",
"details": []
},
{
"value": 1,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
}
}
それぞれの詳細内容毎のスコアと計算式まで表示されていてわかりやすいですね。
Profile API
最後にProfile APIです。
こちらは検索を実行した際にクエリを処理していく各タイミングでの経過時間などの情報を返してくれます。
リクエストの仕方はクエリ内に "profile": true,
を追記するだけです。
GET logstash-2015.05.18/_search
{
"profile": true,
"query": { --- クエリ --- }
}
{
"": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.5134604,
"hits": [
{
"_index": "logstash-2015.05.18",
"_type": "log",
"_id": "GXgyUWABFywrhlsJO2sZ",
"_score": 2.5134604,
"_source": {
"@timestamp": "2015-05-18T15:57:27.541Z",
"ip": "225.44.217.191",
"extension": "jpg",
"response": "200",
"geo": {
"coordinates": {
"lat": 38.53146222,
"lon": -121.7864906
},
"src": "ID",
"dest": "IN",
"srcdest": "ID:IN"
},
"@tags": [
"success",
"info"
],
"utc_time": "2015-05-18T15:57:27.541Z",
"referer": "http://twitter.com/success/ted-freeman",
"agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
"clientip": "225.44.217.191",
"bytes": 2559,
"host": "media-for-the-masses.theacademyofperformingartsandscience.org",
"request": "/uploads/charles-fullerton.jpg",
"url": "https://media-for-the-masses.theacademyofperformingartsandscience.org/uploads/charles-fullerton.jpg",
"@message": """225.44.217.191 - - [2015-05-18T15:57:27.541Z] "GET /uploads/charles-fullerton.jpg HTTP/1.1" 200 2559 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"""",
"spaces": "this is a thing with lots of spaces wwwwoooooo",
"xss": """<script>console.log("xss")</script>""",
"headings": [
"<h3>joe-h-engle</h5>",
"http://www.slate.com/success/robert-s-kimbrough"
],
"links": [
"michael-massimino@www.slate.com",
"http://twitter.com/security/patrick-forrester",
"www.twitter.com"
],
"relatedContent": [
{
"url": "http://www.laweekly.com/music/jay-electronica-much-better-than-his-name-would-suggest-2412364",
"og:type": "article",
"og:title": "Jay Electronica: Much Better Than His Name Would Suggest",
"og:description": "You may not know who Jay Electronica is yet, but I'm willing to bet that you would had he chosen a better name. Jay Electronica does not sound like the ...",
"og:url": "http://www.laweekly.com/music/jay-electronica-much-better-than-his-name-would-suggest-2412364",
"article:published_time": "2008-04-04T16:00:00-07:00",
"article:modified_time": "2014-11-27T08:01:03-08:00",
"article:section": "Music",
"og:site_name": "LA Weekly",
"twitter:title": "Jay Electronica: Much Better Than His Name Would Suggest",
"twitter:description": "You may not know who Jay Electronica is yet, but I'm willing to bet that you would had he chosen a better name. Jay Electronica does not sound like the ...",
"twitter:card": "summary",
"twitter:site": "@laweekly"
}
],
"machine": {
"os": "win 7",
"ram": 17179869184
},
"@version": "1"
}
},
{
"_index": "logstash-2015.05.18",
"_type": "log",
"_id": "FngyUWABFywrhlsJO34h",
"_score": 2.5134604,
"_source": {
"@timestamp": "2015-05-18T11:40:24.100Z",
"ip": "202.243.166.195",
"extension": "jpg",
"response": "200",
"geo": {
"coordinates": {
"lat": 38.99017472,
"lon": -122.8997175
},
"src": "BD",
"dest": "BD",
"srcdest": "BD:BD"
},
"@tags": [
"success",
"info"
],
"utc_time": "2015-05-18T11:40:24.100Z",
"referer": "http://www.slate.com/success/donald-deke-slayton",
"agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24",
"clientip": "202.243.166.195",
"bytes": 2559,
"host": "media-for-the-masses.theacademyofperformingartsandscience.org",
"request": "/uploads/mark-polansky.jpg",
"url": "https://media-for-the-masses.theacademyofperformingartsandscience.org/uploads/mark-polansky.jpg",
"@message": """202.243.166.195 - - [2015-05-18T11:40:24.100Z] "GET /uploads/mark-polansky.jpg HTTP/1.1" 200 2559 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24"""",
"spaces": "this is a thing with lots of spaces wwwwoooooo",
"xss": """<script>console.log("xss")</script>""",
"headings": [
"<h3>richard-linnehan</h5>",
"http://twitter.com/warning/guy-gardner"
],
"links": [
"robert-satcher@twitter.com",
"http://www.slate.com/info/valentina-tereshkova",
"www.www.slate.com"
],
"relatedContent": [],
"machine": {
"os": "win xp",
"ram": 10737418240
},
"@version": "1"
}
}
]
},
"profile": {
"shards": [
{
"id": "[4x5gS5I3QgOjn6fWrFsfsA][logstash-2015.05.18][0]",
"searches": [
{
"query": [
{
"type": "BooleanQuery",
"description": "+@timestamp:[1431939161823 TO 9223372036854775807] +response:200 +extension:jpg +bytes:[2559 TO 2559]",
"time_in_nanos": 534278,
"breakdown": {
"score": 3288,
"build_scorer_count": 4,
"match_count": 2,
"create_weight": 61361,
"next_doc": 68268,
"match": 1323,
"create_weight_count": 1,
"next_doc_count": 4,
"score_count": 2,
"build_scorer": 400025,
"advance": 0,
"advance_count": 0
},
"children": [
{
"type": "IndexOrDocValuesQuery",
"description": "@timestamp:[1431939161823 TO 9223372036854775807]",
"time_in_nanos": 114173,
"breakdown": {
"score": 405,
"build_scorer_count": 8,
"match_count": 2,
"create_weight": 4494,
"next_doc": 0,
"match": 821,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 2,
"build_scorer": 108014,
"advance": 422,
"advance_count": 4
}
},
{
"type": "TermQuery",
"description": "response:200",
"time_in_nanos": 60101,
"breakdown": {
"score": 1082,
"build_scorer_count": 8,
"match_count": 0,
"create_weight": 20625,
"next_doc": 0,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 2,
"build_scorer": 20752,
"advance": 17627,
"advance_count": 4
}
},
{
"type": "TermQuery",
"description": "extension:jpg",
"time_in_nanos": 31650,
"breakdown": {
"score": 278,
"build_scorer_count": 8,
"match_count": 0,
"create_weight": 12306,
"next_doc": 0,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 0,
"score_count": 2,
"build_scorer": 7651,
"advance": 11400,
"advance_count": 4
}
},
{
"type": "PointRangeQuery",
"description": "bytes:[2559 TO 2559]",
"time_in_nanos": 113261,
"breakdown": {
"score": 190,
"build_scorer_count": 8,
"match_count": 0,
"create_weight": 733,
"next_doc": 1003,
"match": 0,
"create_weight_count": 1,
"next_doc_count": 4,
"score_count": 2,
"build_scorer": 111320,
"advance": 0,
"advance_count": 0
}
}
]
}
],
"rewrite_time": 2650,
"collector": [
{
"name": "CancellableCollector",
"reason": "search_cancelled",
"time_in_nanos": 15895,
"children": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time_in_nanos": 7791
}
]
}
]
}
],
"aggregations": []
},
--- 省略 ---
どのAPIもパラメータが多くそれぞれ理解するだけでも大変ですが、クエリチューニングの際には確認しておきたいですね。
参考URL
+Validation API https://www.elastic.co/guide/en/elasticsearch/reference/current/search-validate.html
+Explain API https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
+Profile API https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html