范围聚合
编辑范围聚合编辑
一种基于多桶值源的聚合,它允许用户定义一组范围,每个范围代表一个桶。在聚合过程中,从每个文档中提取的值将与每个桶范围进行比较,并将相关/匹配的文档“归类”到相应的桶中。请注意,此聚合包含每个范围的 from
值,但不包含 to
值。
示例
response = client.search( index: 'sales', body: { aggregations: { price_ranges: { range: { field: 'price', ranges: [ { to: 100 }, { from: 100, to: 200 }, { from: 200 } ] } } } } ) puts response
GET sales/_search { "aggs": { "price_ranges": { "range": { "field": "price", "ranges": [ { "to": 100.0 }, { "from": 100.0, "to": 200.0 }, { "from": 200.0 } ] } } } }
响应
{ ... "aggregations": { "price_ranges": { "buckets": [ { "key": "*-100.0", "to": 100.0, "doc_count": 2 }, { "key": "100.0-200.0", "from": 100.0, "to": 200.0, "doc_count": 2 }, { "key": "200.0-*", "from": 200.0, "doc_count": 3 } ] } } }
带键的响应编辑
将 keyed
标志设置为 true
将与每个桶关联一个唯一的字符串键,并将范围作为哈希而不是数组返回。
response = client.search( index: 'sales', body: { aggregations: { price_ranges: { range: { field: 'price', keyed: true, ranges: [ { to: 100 }, { from: 100, to: 200 }, { from: 200 } ] } } } } ) puts response
GET sales/_search { "aggs": { "price_ranges": { "range": { "field": "price", "keyed": true, "ranges": [ { "to": 100 }, { "from": 100, "to": 200 }, { "from": 200 } ] } } } }
响应
{ ... "aggregations": { "price_ranges": { "buckets": { "*-100.0": { "to": 100.0, "doc_count": 2 }, "100.0-200.0": { "from": 100.0, "to": 200.0, "doc_count": 2 }, "200.0-*": { "from": 200.0, "doc_count": 3 } } } } }
也可以为每个范围自定义键。
response = client.search( index: 'sales', body: { aggregations: { price_ranges: { range: { field: 'price', keyed: true, ranges: [ { key: 'cheap', to: 100 }, { key: 'average', from: 100, to: 200 }, { key: 'expensive', from: 200 } ] } } } } ) puts response
GET sales/_search { "aggs": { "price_ranges": { "range": { "field": "price", "keyed": true, "ranges": [ { "key": "cheap", "to": 100 }, { "key": "average", "from": 100, "to": 200 }, { "key": "expensive", "from": 200 } ] } } } }
响应
{ ... "aggregations": { "price_ranges": { "buckets": { "cheap": { "to": 100.0, "doc_count": 2 }, "average": { "from": 100.0, "to": 200.0, "doc_count": 2 }, "expensive": { "from": 200.0, "doc_count": 3 } } } } }
脚本编辑
如果文档中的数据与您想要聚合的数据不完全匹配,请使用 运行时字段。例如,如果您需要应用特定的货币转换率。
response = client.search( index: 'sales', body: { runtime_mappings: { 'price.euros' => { type: 'double', script: { source: "\n emit(doc['price'].value * params.conversion_rate)\n ", params: { conversion_rate: 0.835526591 } } } }, aggregations: { price_ranges: { range: { field: 'price.euros', ranges: [ { to: 100 }, { from: 100, to: 200 }, { from: 200 } ] } } } } ) puts response
GET sales/_search { "runtime_mappings": { "price.euros": { "type": "double", "script": { "source": """ emit(doc['price'].value * params.conversion_rate) """, "params": { "conversion_rate": 0.835526591 } } } }, "aggs": { "price_ranges": { "range": { "field": "price.euros", "ranges": [ { "to": 100 }, { "from": 100, "to": 200 }, { "from": 200 } ] } } } }
子聚合编辑
以下示例不仅将文档“归类”到不同的桶中,还计算每个价格范围内的价格统计信息。
response = client.search( index: 'sales', body: { aggregations: { price_ranges: { range: { field: 'price', ranges: [ { to: 100 }, { from: 100, to: 200 }, { from: 200 } ] }, aggregations: { price_stats: { stats: { field: 'price' } } } } } } ) puts response
GET sales/_search { "aggs": { "price_ranges": { "range": { "field": "price", "ranges": [ { "to": 100 }, { "from": 100, "to": 200 }, { "from": 200 } ] }, "aggs": { "price_stats": { "stats": { "field": "price" } } } } } }
响应
{ ... "aggregations": { "price_ranges": { "buckets": [ { "key": "*-100.0", "to": 100.0, "doc_count": 2, "price_stats": { "count": 2, "min": 10.0, "max": 50.0, "avg": 30.0, "sum": 60.0 } }, { "key": "100.0-200.0", "from": 100.0, "to": 200.0, "doc_count": 2, "price_stats": { "count": 2, "min": 150.0, "max": 175.0, "avg": 162.5, "sum": 325.0 } }, { "key": "200.0-*", "from": 200.0, "doc_count": 3, "price_stats": { "count": 3, "min": 200.0, "max": 200.0, "avg": 200.0, "sum": 600.0 } } ] } } }
直方图字段编辑
在直方图字段上运行范围聚合将计算每个配置范围的总计数。
这是在不插值直方图字段值的情况下完成的。因此,可能存在“介于”两个直方图值之间的范围。结果范围桶将具有零文档计数。
以下是一个示例,对以下索引执行范围聚合,该索引存储了不同网络的延迟指标(以毫秒为单位)的预聚合直方图。
response = client.indices.create( index: 'metrics_index', body: { mappings: { properties: { network: { properties: { name: { type: 'keyword' } } }, latency_histo: { type: 'histogram' } } } } ) puts response response = client.index( index: 'metrics_index', id: 1, refresh: true, body: { 'network.name' => 'net-1', latency_histo: { values: [ 1, 3, 8, 12, 15 ], counts: [ 3, 7, 23, 12, 6 ] } } ) puts response response = client.index( index: 'metrics_index', id: 2, refresh: true, body: { 'network.name' => 'net-2', latency_histo: { values: [ 1, 6, 8, 12, 14 ], counts: [ 8, 17, 8, 7, 6 ] } } ) puts response response = client.search( index: 'metrics_index', size: 0, filter_path: 'aggregations', body: { aggregations: { latency_ranges: { range: { field: 'latency_histo', ranges: [ { to: 2 }, { from: 2, to: 3 }, { from: 3, to: 10 }, { from: 10 } ] } } } } ) puts response
PUT metrics_index { "mappings": { "properties": { "network": { "properties": { "name": { "type": "keyword" } } }, "latency_histo": { "type": "histogram" } } } } PUT metrics_index/_doc/1?refresh { "network.name" : "net-1", "latency_histo" : { "values" : [1, 3, 8, 12, 15], "counts" : [3, 7, 23, 12, 6] } } PUT metrics_index/_doc/2?refresh { "network.name" : "net-2", "latency_histo" : { "values" : [1, 6, 8, 12, 14], "counts" : [8, 17, 8, 7, 6] } } GET metrics_index/_search?size=0&filter_path=aggregations { "aggs": { "latency_ranges": { "range": { "field": "latency_histo", "ranges": [ {"to": 2}, {"from": 2, "to": 3}, {"from": 3, "to": 10}, {"from": 10} ] } } } }
range
聚合将根据 values
计算的每个范围的计数进行求和,并返回以下输出。
{ "aggregations": { "latency_ranges": { "buckets": [ { "key": "*-2.0", "to": 2.0, "doc_count": 11 }, { "key": "2.0-3.0", "from": 2.0, "to": 3.0, "doc_count": 0 }, { "key": "3.0-10.0", "from": 3.0, "to": 10.0, "doc_count": 55 }, { "key": "10.0-*", "from": 10.0, "doc_count": 31 } ] } } }
范围聚合是一种桶聚合,它将文档划分为桶,而不是像度量聚合那样在字段上计算度量。每个桶代表一组文档,子聚合可以在这些文档上运行。另一方面,直方图字段是一个预聚合字段,它表示单个字段内的多个值:数值数据的桶以及每个桶的项目/文档计数。范围聚合的预期输入(期望原始文档)与直方图字段(提供汇总信息)之间的这种不匹配限制了聚合的结果,使其仅限于每个桶的文档计数。
因此,在直方图字段上执行范围聚合时,不允许使用子聚合。