多字段聚合

编辑

多字段聚合是一种基于多桶值源的聚合,其中桶是动态构建的 - 每个唯一的数值集对应一个桶。多字段聚合与terms 聚合非常相似,但在大多数情况下,它会比 terms 聚合慢,并且会消耗更多的内存。因此,如果始终使用相同的字段集,则将此字段的组合键索引为单独的字段,并在该字段上使用 terms 聚合会更有效。

当您需要按复合键上的文档数量或指标聚合进行排序并获得前 N 个结果时,多字段聚合是最有用的。如果不需要排序,并且期望使用嵌套 terms 聚合或复合聚合检索所有值,则会是一种更快、更节省内存的解决方案。

示例

resp = client.search(
    index="products",
    aggs={
        "genres_and_products": {
            "multi_terms": {
                "terms": [
                    {
                        "field": "genre"
                    },
                    {
                        "field": "product"
                    }
                ]
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'products',
  body: {
    aggregations: {
      genres_and_products: {
        multi_terms: {
          terms: [
            {
              field: 'genre'
            },
            {
              field: 'product'
            }
          ]
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "products",
  aggs: {
    genres_and_products: {
      multi_terms: {
        terms: [
          {
            field: "genre",
          },
          {
            field: "product",
          },
        ],
      },
    },
  },
});
console.log(response);
GET /products/_search
{
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [{
          "field": "genre" 
        }, {
          "field": "product"
        }]
      }
    }
  }
}

multi_terms 聚合可以使用与terms 聚合相同的字段类型,并支持大多数 terms 聚合参数。

响应

{
  ...
  "aggregations" : {
    "genres_and_products" : {
      "doc_count_error_upper_bound" : 0,  
      "sum_other_doc_count" : 0,          
      "buckets" : [                       
        {
          "key" : [                       
            "rock",
            "Product A"
          ],
          "key_as_string" : "rock|Product A",
          "doc_count" : 2
        },
        {
          "key" : [
            "electronic",
            "Product B"
          ],
          "key_as_string" : "electronic|Product B",
          "doc_count" : 1
        },
        {
          "key" : [
            "jazz",
            "Product B"
          ],
          "key_as_string" : "jazz|Product B",
          "doc_count" : 1
        },
        {
          "key" : [
            "rock",
            "Product B"
          ],
          "key_as_string" : "rock|Product B",
          "doc_count" : 1
        }
      ]
    }
  }
}

每个 term 的文档计数误差的上限,请参见 <<search-aggregations-bucket-multi-terms-aggregation-approximate-counts,下方>

当存在大量唯一 term 时,Elasticsearch 只返回顶部 term;此数字是所有不属于响应的桶的文档计数的总和

顶部桶的列表。

键是值的数组,其顺序与聚合的 terms 参数中的表达式相同

默认情况下,multi_terms 聚合将返回按 doc_count 排序的前十个 term 的桶。可以通过设置 size 参数来更改此默认行为。

聚合参数

编辑

支持以下参数。有关这些参数的更详细说明,请参见terms 聚合

size

可选。定义应从整个 term 列表中返回多少个 term 桶。默认为 10。

shard_size

可选。请求的 size 越大,结果越准确,但计算最终结果的成本也越高。默认的 shard_size(size * 1.5 + 10)

show_term_doc_count_error

可选。计算每个 term 的文档计数误差。默认为 false

order

可选。指定桶的顺序。默认为每个桶的文档数。对于文档计数相同的桶,使用桶 term 值作为平局决胜值。

min_doc_count

可选。要返回的桶中,文档的最小数量。默认为 1。

shard_min_doc_count

可选。要返回的每个分片上的桶中,文档的最小数量。默认为 min_doc_count

collect_mode

可选。指定数据收集的策略。支持 depth_firstbreadth_first 模式。默认为 breadth_first

脚本

编辑

使用脚本生成 term

resp = client.search(
    index="products",
    runtime_mappings={
        "genre.length": {
            "type": "long",
            "script": "emit(doc['genre'].value.length())"
        }
    },
    aggs={
        "genres_and_products": {
            "multi_terms": {
                "terms": [
                    {
                        "field": "genre.length"
                    },
                    {
                        "field": "product"
                    }
                ]
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'products',
  body: {
    runtime_mappings: {
      'genre.length' => {
        type: 'long',
        script: "emit(doc['genre'].value.length())"
      }
    },
    aggregations: {
      genres_and_products: {
        multi_terms: {
          terms: [
            {
              field: 'genre.length'
            },
            {
              field: 'product'
            }
          ]
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "products",
  runtime_mappings: {
    "genre.length": {
      type: "long",
      script: "emit(doc['genre'].value.length())",
    },
  },
  aggs: {
    genres_and_products: {
      multi_terms: {
        terms: [
          {
            field: "genre.length",
          },
          {
            field: "product",
          },
        ],
      },
    },
  },
});
console.log(response);
GET /products/_search
{
  "runtime_mappings": {
    "genre.length": {
      "type": "long",
      "script": "emit(doc['genre'].value.length())"
    }
  },
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [
          {
            "field": "genre.length"
          },
          {
            "field": "product"
          }
        ]
      }
    }
  }
}

响应

{
  ...
  "aggregations" : {
    "genres_and_products" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : [
            4,
            "Product A"
          ],
          "key_as_string" : "4|Product A",
          "doc_count" : 2
        },
        {
          "key" : [
            4,
            "Product B"
          ],
          "key_as_string" : "4|Product B",
          "doc_count" : 2
        },
        {
          "key" : [
            10,
            "Product B"
          ],
          "key_as_string" : "10|Product B",
          "doc_count" : 1
        }
      ]
    }
  }
}

缺失值

编辑

missing 参数定义如何处理缺少值的文档。默认情况下,如果缺少任何键组件,则将忽略整个文档,但也可以使用 missing 参数将它们视为具有一个值。

resp = client.search(
    index="products",
    aggs={
        "genres_and_products": {
            "multi_terms": {
                "terms": [
                    {
                        "field": "genre"
                    },
                    {
                        "field": "product",
                        "missing": "Product Z"
                    }
                ]
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'products',
  body: {
    aggregations: {
      genres_and_products: {
        multi_terms: {
          terms: [
            {
              field: 'genre'
            },
            {
              field: 'product',
              missing: 'Product Z'
            }
          ]
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "products",
  aggs: {
    genres_and_products: {
      multi_terms: {
        terms: [
          {
            field: "genre",
          },
          {
            field: "product",
            missing: "Product Z",
          },
        ],
      },
    },
  },
});
console.log(response);
GET /products/_search
{
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [
          {
            "field": "genre"
          },
          {
            "field": "product",
            "missing": "Product Z"
          }
        ]
      }
    }
  }
}

响应

{
   ...
   "aggregations" : {
    "genres_and_products" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : [
            "rock",
            "Product A"
          ],
          "key_as_string" : "rock|Product A",
          "doc_count" : 2
        },
        {
          "key" : [
            "electronic",
            "Product B"
          ],
          "key_as_string" : "electronic|Product B",
          "doc_count" : 1
        },
        {
          "key" : [
            "electronic",
            "Product Z"
          ],
          "key_as_string" : "electronic|Product Z",  
          "doc_count" : 1
        },
        {
          "key" : [
            "jazz",
            "Product B"
          ],
          "key_as_string" : "jazz|Product B",
          "doc_count" : 1
        },
        {
          "key" : [
            "rock",
            "Product B"
          ],
          "key_as_string" : "rock|Product B",
          "doc_count" : 1
        }
      ]
    }
  }
}

product 字段中没有值的文档将与具有值 Product Z 的文档落在同一个桶中。

混合字段类型

编辑

当在多个索引上进行聚合时,聚合字段的类型在所有索引中可能不同。某些类型彼此兼容(integerlongfloatdouble),但是当类型是小数和非小数的混合时,terms 聚合会将非小数提升为小数。这会导致桶值中精度的损失。

子聚合和排序示例

编辑

与大多数桶聚合一样,multi_term 支持子聚合和按指标子聚合排序桶

resp = client.search(
    index="products",
    aggs={
        "genres_and_products": {
            "multi_terms": {
                "terms": [
                    {
                        "field": "genre"
                    },
                    {
                        "field": "product"
                    }
                ],
                "order": {
                    "total_quantity": "desc"
                }
            },
            "aggs": {
                "total_quantity": {
                    "sum": {
                        "field": "quantity"
                    }
                }
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'products',
  body: {
    aggregations: {
      genres_and_products: {
        multi_terms: {
          terms: [
            {
              field: 'genre'
            },
            {
              field: 'product'
            }
          ],
          order: {
            total_quantity: 'desc'
          }
        },
        aggregations: {
          total_quantity: {
            sum: {
              field: 'quantity'
            }
          }
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "products",
  aggs: {
    genres_and_products: {
      multi_terms: {
        terms: [
          {
            field: "genre",
          },
          {
            field: "product",
          },
        ],
        order: {
          total_quantity: "desc",
        },
      },
      aggs: {
        total_quantity: {
          sum: {
            field: "quantity",
          },
        },
      },
    },
  },
});
console.log(response);
GET /products/_search
{
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [
          {
            "field": "genre"
          },
          {
            "field": "product"
          }
        ],
        "order": {
          "total_quantity": "desc"
        }
      },
      "aggs": {
        "total_quantity": {
          "sum": {
            "field": "quantity"
          }
        }
      }
    }
  }
}
{
  ...
  "aggregations" : {
    "genres_and_products" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : [
            "jazz",
            "Product B"
          ],
          "key_as_string" : "jazz|Product B",
          "doc_count" : 1,
          "total_quantity" : {
            "value" : 10.0
          }
        },
        {
          "key" : [
            "rock",
            "Product A"
          ],
          "key_as_string" : "rock|Product A",
          "doc_count" : 2,
          "total_quantity" : {
            "value" : 9.0
          }
        },
        {
          "key" : [
            "electronic",
            "Product B"
          ],
          "key_as_string" : "electronic|Product B",
          "doc_count" : 1,
          "total_quantity" : {
            "value" : 3.0
          }
        },
        {
          "key" : [
            "rock",
            "Product B"
          ],
          "key_as_string" : "rock|Product B",
          "doc_count" : 1,
          "total_quantity" : {
            "value" : 1.0
          }
        }
      ]
    }
  }
}