Elasticsearchへのクエリが突然Failし始めた時の対処(fielddata cacheがデフォルト無限で泣いた話) #Elasticsearch

Elasticsearchにクエリ投げても突然応答しなくなるという事象が発生し、調査してみたので内容をシェア。

出たエラー

突然Elasticsearchのクエリがfailし始めて、Kibanaのグラフが応答しなくなったという話が上がる。。

ログ見てると、下記の用語が怪しいので調査

うちの環境だと、HEAP_SIZEを18gにしていたので、まさにこのエラーは、[indices.breaker.fielddata.limit]でひっかかっている。(18g*60%=10.8g...)

用語	内容
CircuitBreaker	OOM(OutOfMemory)を制御する為の機構デフォルト値は下記 [indices.breaker.fielddata.limit] The fielddata circuit breaker limits the size of fielddata to 60% of the heap, by default. [indices.breaker.request.limit] The request circuit breaker estimates the size of structures required to complete other parts of a request, such as creating aggregation buckets, and limits them to 40% of the heap, by default. [indices.breaker.total.limit] The total circuit breaker wraps the request and fielddata circuit breakers to ensure that the combination of the two doesn’t use more than 70% of the heap by default.
FielddataCache	Elasticsearchのクエリキャッシュ機構デフォルト制限なし&削除(期限)なし

用語

内容

CircuitBreaker

OOM(OutOfMemory)を制御する為の機構
デフォルト値は下記
[indices.breaker.fielddata.limit]
The fielddata circuit breaker limits the size of fielddata to 60% of the heap, by default.
[indices.breaker.request.limit]
The request circuit breaker estimates the size of structures required to complete other parts of a request, such as creating aggregation buckets, and limits them to 40% of the heap, by default.
[indices.breaker.total.limit]
The total circuit breaker wraps the request and fielddata circuit breakers to ensure that the combination of the two doesn’t use more than 70% of the heap by default.

FielddataCache

Elasticsearchのクエリキャッシュ機構
デフォルト制限なし&削除(期限)なし

対処方法

暫定対応

APIでキャッシュをクリアする。
※クリア直後からElasticsearchへのクエリが反応しはじめる

$ curl -XPOST 'http://localhost:9200/_cache/clear' -d '{ "fielddata": "true" }'
{"_shards":{"total":1430,"successful":715,"failed":0}}

恒久対応

elasticsearch.ymlに下記設定を追加しサービス再起動

最適な値は正直わかっていないので、運用しながら要チューニング

indices.fielddata.cache.expire: 60m #default -1
indices.fielddata.cache.size: 30% #default 無限

現状のFielddataCacheサイズを知る

_allが全体
fielddata.memory_size_in_bytesが該当のサイズ
インデックス毎のサイズも見れる

#index毎
curl -XGET 'http://localhost:9200/_stats/fielddata/?fields=field1,field2&pretty'

所感

時系列データ突っ込んでいる場合は、結構あたる問題だと思うので、expireくらいはデフォルトでつけてもいいような気が、、
突然の事で辛かった。。

参照

https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker
http://igor.kupczynski.info/2015/04/06/fielddata.html
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/cluster-nodes-stats.html