More than 5 years have passed since last update.

【elasticsearch6.6.0】scroll APIで大量の検索結果を取得する

Elasticsearch

Last updated at 2019-04-05Posted at 2019-04-05

Elasticsearchで何か検索した時、特にオプションで何も指定しなければ10件までしか取得されない。
オプションでsizeを指定するとこれを大きくできるのだが、このsizeにも上限があり、10,000件までだ。

10,000件を超えるデータを取得する際は、scroll APIで何回かに分けて取得する事になる。
（10,000件以下でもscrollで複数に分けて取得することも可能）

scroll APIは、実行時点でのスナップショットを保存して、取得しきれなかった分を辿っていくことができる。

$ curl -H "Content-Type: application/json" -XGET 'http://localhost:9200/index名/type名/_search?scroll=1m&size=1000&pretty=true' -d '
{
  "query": {
    "match": {
      "field名": "検索したい文字列"
    }
  }
}'

# 実行結果
# 1000件取得できる。
  "_scroll_id" : "DnF1ZXJ5..........",
  "took" : ,
  "timed_out" : false,
  "_shards" : {
    "total" : ,
    "successful" : ,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : ,
    "max_score" : ,
    "hits" : [
      {
        "_index" : "index名",
        "_type" : "type名",
        "_id" : "8x0a6......",
        "_score" : ,
        "_source" : {
          "field" : "文字列"
        }
      },
...........
...........
......

1m は、スナップショットを保持する期間。
size=1000 は、検索した時に取得する件数。

以降は、以下のコマンドで残りのデータを取得できる。
size=1000 なので、10,000件のデータなら、以下のコマンドを10回実行すれば全ての検索結果を取得できる。
スナップショットは1mしか保持されないので、時間を超えると、４０４エラーになる。

# _scroll_idを指定して実行。
curl -H "Content-Type: application/json" -XGET 'localhost:9200/_search/scroll'  -d'
{
    "scroll_id" : "DnF1ZXJ5.........." 
}'

使わなくなったら、scroll_idを削除すると行儀が良い。


curl -H "Content-Type: application/json" -XDELETE  'localhost:9200/_search/scroll' -d '
{
    "scroll_id" : "DnF1ZXJ5.........."
}'

参考

Elasticsearch でつまづいた話 (3)
ElasticsearchのScroll APIをためしてみた

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up