多字段聚合
编辑多字段聚合
编辑多字段聚合是一种基于多桶值源的聚合,其中桶是动态构建的 - 每个唯一的数值集对应一个桶。多字段聚合与terms 聚合
非常相似,但在大多数情况下,它会比 terms 聚合慢,并且会消耗更多的内存。因此,如果始终使用相同的字段集,则将此字段的组合键索引为单独的字段,并在该字段上使用 terms 聚合会更有效。
当您需要按复合键上的文档数量或指标聚合进行排序并获得前 N 个结果时,多字段聚合是最有用的。如果不需要排序,并且期望使用嵌套 terms 聚合或复合聚合
检索所有值,则会是一种更快、更节省内存的解决方案。
示例
resp = client.search( index="products", aggs={ "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre" }, { "field": "product" } ] } } }, ) print(resp)
response = client.search( index: 'products', body: { aggregations: { genres_and_products: { multi_terms: { terms: [ { field: 'genre' }, { field: 'product' } ] } } } } ) puts response
const response = await client.search({ index: "products", aggs: { genres_and_products: { multi_terms: { terms: [ { field: "genre", }, { field: "product", }, ], }, }, }, }); console.log(response);
GET /products/_search { "aggs": { "genres_and_products": { "multi_terms": { "terms": [{ "field": "genre" }, { "field": "product" }] } } } }
|
响应
{ ... "aggregations" : { "genres_and_products" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : [ "rock", "Product A" ], "key_as_string" : "rock|Product A", "doc_count" : 2 }, { "key" : [ "electronic", "Product B" ], "key_as_string" : "electronic|Product B", "doc_count" : 1 }, { "key" : [ "jazz", "Product B" ], "key_as_string" : "jazz|Product B", "doc_count" : 1 }, { "key" : [ "rock", "Product B" ], "key_as_string" : "rock|Product B", "doc_count" : 1 } ] } } }
每个 term 的文档计数误差的上限,请参见 <<search-aggregations-bucket-multi-terms-aggregation-approximate-counts,下方> |
|
当存在大量唯一 term 时,Elasticsearch 只返回顶部 term;此数字是所有不属于响应的桶的文档计数的总和 |
|
顶部桶的列表。 |
|
键是值的数组,其顺序与聚合的 |
默认情况下,multi_terms
聚合将返回按 doc_count
排序的前十个 term 的桶。可以通过设置 size
参数来更改此默认行为。
聚合参数
编辑支持以下参数。有关这些参数的更详细说明,请参见terms 聚合
。
size |
可选。定义应从整个 term 列表中返回多少个 term 桶。默认为 10。 |
shard_size |
可选。请求的 |
show_term_doc_count_error |
可选。计算每个 term 的文档计数误差。默认为 |
order |
可选。指定桶的顺序。默认为每个桶的文档数。对于文档计数相同的桶,使用桶 term 值作为平局决胜值。 |
min_doc_count |
可选。要返回的桶中,文档的最小数量。默认为 1。 |
shard_min_doc_count |
可选。要返回的每个分片上的桶中,文档的最小数量。默认为 |
collect_mode |
可选。指定数据收集的策略。支持 |
脚本
编辑使用脚本生成 term
resp = client.search( index="products", runtime_mappings={ "genre.length": { "type": "long", "script": "emit(doc['genre'].value.length())" } }, aggs={ "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre.length" }, { "field": "product" } ] } } }, ) print(resp)
response = client.search( index: 'products', body: { runtime_mappings: { 'genre.length' => { type: 'long', script: "emit(doc['genre'].value.length())" } }, aggregations: { genres_and_products: { multi_terms: { terms: [ { field: 'genre.length' }, { field: 'product' } ] } } } } ) puts response
const response = await client.search({ index: "products", runtime_mappings: { "genre.length": { type: "long", script: "emit(doc['genre'].value.length())", }, }, aggs: { genres_and_products: { multi_terms: { terms: [ { field: "genre.length", }, { field: "product", }, ], }, }, }, }); console.log(response);
GET /products/_search { "runtime_mappings": { "genre.length": { "type": "long", "script": "emit(doc['genre'].value.length())" } }, "aggs": { "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre.length" }, { "field": "product" } ] } } } }
响应
{ ... "aggregations" : { "genres_and_products" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : [ 4, "Product A" ], "key_as_string" : "4|Product A", "doc_count" : 2 }, { "key" : [ 4, "Product B" ], "key_as_string" : "4|Product B", "doc_count" : 2 }, { "key" : [ 10, "Product B" ], "key_as_string" : "10|Product B", "doc_count" : 1 } ] } } }
缺失值
编辑missing
参数定义如何处理缺少值的文档。默认情况下,如果缺少任何键组件,则将忽略整个文档,但也可以使用 missing
参数将它们视为具有一个值。
resp = client.search( index="products", aggs={ "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre" }, { "field": "product", "missing": "Product Z" } ] } } }, ) print(resp)
response = client.search( index: 'products', body: { aggregations: { genres_and_products: { multi_terms: { terms: [ { field: 'genre' }, { field: 'product', missing: 'Product Z' } ] } } } } ) puts response
const response = await client.search({ index: "products", aggs: { genres_and_products: { multi_terms: { terms: [ { field: "genre", }, { field: "product", missing: "Product Z", }, ], }, }, }, }); console.log(response);
GET /products/_search { "aggs": { "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre" }, { "field": "product", "missing": "Product Z" } ] } } } }
响应
{ ... "aggregations" : { "genres_and_products" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : [ "rock", "Product A" ], "key_as_string" : "rock|Product A", "doc_count" : 2 }, { "key" : [ "electronic", "Product B" ], "key_as_string" : "electronic|Product B", "doc_count" : 1 }, { "key" : [ "electronic", "Product Z" ], "key_as_string" : "electronic|Product Z", "doc_count" : 1 }, { "key" : [ "jazz", "Product B" ], "key_as_string" : "jazz|Product B", "doc_count" : 1 }, { "key" : [ "rock", "Product B" ], "key_as_string" : "rock|Product B", "doc_count" : 1 } ] } } }
混合字段类型
编辑当在多个索引上进行聚合时,聚合字段的类型在所有索引中可能不同。某些类型彼此兼容(integer
和 long
或 float
和 double
),但是当类型是小数和非小数的混合时,terms 聚合会将非小数提升为小数。这会导致桶值中精度的损失。
子聚合和排序示例
编辑与大多数桶聚合一样,multi_term
支持子聚合和按指标子聚合排序桶
resp = client.search( index="products", aggs={ "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre" }, { "field": "product" } ], "order": { "total_quantity": "desc" } }, "aggs": { "total_quantity": { "sum": { "field": "quantity" } } } } }, ) print(resp)
response = client.search( index: 'products', body: { aggregations: { genres_and_products: { multi_terms: { terms: [ { field: 'genre' }, { field: 'product' } ], order: { total_quantity: 'desc' } }, aggregations: { total_quantity: { sum: { field: 'quantity' } } } } } } ) puts response
const response = await client.search({ index: "products", aggs: { genres_and_products: { multi_terms: { terms: [ { field: "genre", }, { field: "product", }, ], order: { total_quantity: "desc", }, }, aggs: { total_quantity: { sum: { field: "quantity", }, }, }, }, }, }); console.log(response);
GET /products/_search { "aggs": { "genres_and_products": { "multi_terms": { "terms": [ { "field": "genre" }, { "field": "product" } ], "order": { "total_quantity": "desc" } }, "aggs": { "total_quantity": { "sum": { "field": "quantity" } } } } } }
{ ... "aggregations" : { "genres_and_products" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : [ "jazz", "Product B" ], "key_as_string" : "jazz|Product B", "doc_count" : 1, "total_quantity" : { "value" : 10.0 } }, { "key" : [ "rock", "Product A" ], "key_as_string" : "rock|Product A", "doc_count" : 2, "total_quantity" : { "value" : 9.0 } }, { "key" : [ "electronic", "Product B" ], "key_as_string" : "electronic|Product B", "doc_count" : 1, "total_quantity" : { "value" : 3.0 } }, { "key" : [ "rock", "Product B" ], "key_as_string" : "rock|Product B", "doc_count" : 1, "total_quantity" : { "value" : 1.0 } } ] } } }