I'm using Elasticsearch 2.3 and I'm trying to perform a two-step computation using a pipeline aggregation. I'm only interested in the final result of my pipeline aggregation but Elasticsearch returns all the buckets information.
Since I have a huge number of buckets (tens or hundreds of millions), this is prohibitive. Unfortunately, I cannot find a way to tell Es not to return all this information.
Here is a toy example. I have an index test-index with a document type obj. obj has two fields, key and values.
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 100,
  "key": "foo"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 20,
  "key": "foo"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 50,
  "key": "bar"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 60,
  "key": "bar"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 70,
  "key": "bar"
}'
I want to get the average value (over all keys ) of the minimum value of objs having the same keys.
An average of minima.
Elasticsearch allows me to do this:
curl -XPOST 'http://10.10.0.7:9200/test-index/obj/_search' -d '{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "key_aggregates": {
      "terms": {
        "field": "key",
        "size": 0
      },
      "aggs": {
        "min_value": {
          "min": {
            "field": "value"
          }
        }
      }
    },
    "avg_min_value": {
      "avg_bucket": {
        "buckets_path": "key_aggregates>min_value"
      }
    }
  }
}'
But this query returns the minimum for every bucket, although I don't need it:
{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": [
    ]
  },
  "aggregations": {
    "key_aggregates": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 2,
          "min_value": {
            "value": 50
          }
        },
        {
          "key": "foo",
          "doc_count": 2,
          "min_value": {
            "value": 20
          }
        }
      ]
    },
    "avg_min_value": {
      "value": 35
    }
  }
}
Is there a way to get rid of all the information inside "buckets": [...]? I'm only interested in avg_min_value.
This might not seem like a problem in this toy example, but when the number of different keys is not big (tens or hundreds of millions), the query response is prohibitively large, and I would like to prune it.
Is there a way to do this with Elasticsearch? Or am I modelling my data wrong?
NB: it is not acceptable to pre-aggregate my data per key, since the match_all part of my query might be replaced by complex and unknown filters.
NB2: changing size to a non-negative number in my terms aggregation is not acceptable because it would change the result.