###テーマ
「日本のさくら名所100選を都道府県別に集計する」
####使用データ
####環境
- MacBook Pro (Retina, 15-inch, Mid 2014)
- 2.2 GHz Intel Core i7
- 16 GB 1600 MHz DDR3
- OS X El Capitan 10.11(15A284)
- Elasticsearch2.0.0
####準備
- bulk indexingが可能なjson形式に加工(詳細略)
- index作成
curl -s -H 'Content-Type: application/json' -XPUT localhost:9200/100_cherry -d '{
"settings": {
"index": {
"number_of_replicas": 0, // 1台構成クラスタなので
"number_of_shards": 1,
"refresh_interval": -1 // これは個人的な好み
}
}
}'
- mapping設定
curl -X PUT 'localhost:9200/100_cherry/_mapping/doc' -d '{
"properties": {
"wikipedia_url": {
"type": "string",
"index": "not_analyzed"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"pref": {
"type": "string",
"index": "not_analyzed"
},
"geo_point": {
"type" : "geo_point"
}
}
}'
####Query
query.json
curl -XGET "http://localhost:9200/100_cherry/_search?search_type=count" -d'
{
"query": {
"match_all": {} // 全ドキュメント取得
},
"aggs": {
"pref": {
"terms": {
"field": "pref", // 都道府県ごとに名所の件数を集計
"size": 5 // 5つのbucketを返す
}
},
"max": {
"max_bucket": { // 最大
"buckets_path": "pref._count"
}
},
"min": {
"min_bucket": { // 最小
"buckets_path": "pref._count"
}
},
"ave": {
"avg_bucket": { // 平均
"buckets_path": "pref._count"
}
},
"sum": {
"sum_bucket": { // 合計
"buckets_path": "pref._count"
}
}
}
}'
####結果
response.json
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 100,
"max_score": 0,
"hits": []
},
"aggregations": {
"pref": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 81, // sizeで指定した数のbucketに含まれないドキュメント
"buckets": [
{
"key": "東京都",
"doc_count": 5
},
{
"key": "京都府",
"doc_count": 4
},
{
"key": "愛知県",
"doc_count": 4
},
{
"key": "兵庫県",
"doc_count": 3
},
{
"key": "千葉県",
"doc_count": 3
}
]
},
"max": {
"value": 5,
"keys": [
"東京都"
]
},
"min": {
"value": 3,
"keys": [
"兵庫県",
"千葉県"
]
},
"ave": {
"value": 3.8
},
"sum": {
"value": 19
}
}
}
####まとめと感想
- 上位のaggでsizeを指定するとその数のbucketの中で最大・最小・平均・合計を出す
- クエリも機能もわかりやすく想像通りの動きをしてくれる
- オープンデータを使ってみた
####参考
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-avg-bucket-aggregation.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-max-bucket-aggregation.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-min-bucket-aggregation.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-sum-bucket-aggregation.html