LoginSignup
1
1

More than 5 years have passed since last update.

[elasticsearch2.0] Pipeline Aggregationを試す -Avg/Max/Min/Sum Aggregation

Last updated at Posted at 2015-11-13

テーマ

「日本のさくら名所100選を都道府県別に集計する」

使用データ

環境

  • MacBook Pro (Retina, 15-inch, Mid 2014)
  • 2.2 GHz Intel Core i7
  • 16 GB 1600 MHz DDR3
  • OS X El Capitan 10.11(15A284)
  • Elasticsearch2.0.0

準備

  • bulk indexingが可能なjson形式に加工(詳細略)
  • index作成
curl -s -H 'Content-Type: application/json' -XPUT localhost:9200/100_cherry -d '{
  "settings": {
    "index": {
      "number_of_replicas": 0, // 1台構成クラスタなので
      "number_of_shards": 1,
      "refresh_interval": -1 // これは個人的な好み
    }
  }
}'
  • mapping設定
curl -X PUT 'localhost:9200/100_cherry/_mapping/doc' -d '{
  "properties": {
    "wikipedia_url": {
      "type": "string",
      "index": "not_analyzed" 
    },
    "location": {
      "type": "string",
      "index": "not_analyzed" 
    },
    "pref": {
      "type": "string",
      "index": "not_analyzed" 
    },
    "geo_point": {
      "type" : "geo_point"
    }
  }
}'

Query

query.json
curl -XGET "http://localhost:9200/100_cherry/_search?search_type=count" -d'
{
  "query": {
    "match_all": {} // 全ドキュメント取得
  },
  "aggs": {
    "pref": {
      "terms": {
        "field": "pref", // 都道府県ごとに名所の件数を集計
        "size": 5 // 5つのbucketを返す
      }
    },
    "max": {
      "max_bucket": { // 最大
        "buckets_path": "pref._count"
      }
    },
    "min": {
      "min_bucket": { // 最小
        "buckets_path": "pref._count"
      }
    },
    "ave": {
      "avg_bucket": { // 平均
        "buckets_path": "pref._count"
      }
    },
    "sum": {
      "sum_bucket": { // 合計
        "buckets_path": "pref._count"
      }
    }
  }
}'

結果

response.json
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 100,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "pref": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 81, // sizeで指定した数のbucketに含まれないドキュメント
      "buckets": [
        {
          "key": "東京都",
          "doc_count": 5
        },
        {
          "key": "京都府",
          "doc_count": 4
        },
        {
          "key": "愛知県",
          "doc_count": 4
        },
        {
          "key": "兵庫県",
          "doc_count": 3
        },
        {
          "key": "千葉県",
          "doc_count": 3
        }
      ]
    },
    "max": {
      "value": 5,
      "keys": [
        "東京都"
      ]
    },
    "min": {
      "value": 3,
      "keys": [
        "兵庫県",
        "千葉県"
      ]
    },
    "ave": {
      "value": 3.8
    },
    "sum": {
      "value": 19
    }
  }
}

まとめと感想

  • 上位のaggでsizeを指定するとその数のbucketの中で最大・最小・平均・合計を出す
  • クエリも機能もわかりやすく想像通りの動きをしてくれる
  • オープンデータを使ってみた

参考

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1