›

聚合

编辑

聚合

编辑

聚合将您的数据汇总为指标、统计数据或其他分析结果。聚合可以帮助您回答以下问题：

我网站的平均加载时间是多少？
基于交易量，我最有价值的客户是谁？
在我的网络中，什么可以被认为是大型文件？
每个产品类别中有多少种产品？

Elasticsearch 将聚合分为三个类别：

指标聚合，用于计算字段值的指标，例如总和或平均值。
桶聚合，用于根据字段值、范围或其他条件将文档分组到桶（也称为箱）中。
管道聚合，它从其他聚合而不是文档或字段获取输入。

运行聚合

编辑

您可以通过指定搜索的搜索 API 的 aggs 参数来运行聚合。以下搜索在 my-field 上运行词项聚合

resp = client.search(
    index="my-index-000001",
    aggs={
        "my-agg-name": {
            "terms": {
                "field": "my-field"
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  body: {
    aggregations: {
      "my-agg-name": {
        terms: {
          field: 'my-field'
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  aggs: {
    "my-agg-name": {
      terms: {
        field: "my-field",
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}

聚合结果位于响应的 aggregations 对象中

{
  "took": 78,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [...]
  },
  "aggregations": {
    "my-agg-name": {                           
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

my-agg-name 聚合的结果。

更改聚合的范围

编辑

使用 query 参数来限制聚合运行的文档

resp = client.search(
    index="my-index-000001",
    query={
        "range": {
            "@timestamp": {
                "gte": "now-1d/d",
                "lt": "now/d"
            }
        }
    },
    aggs={
        "my-agg-name": {
            "terms": {
                "field": "my-field"
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      range: {
        "@timestamp": {
          gte: 'now-1d/d',
          lt: 'now/d'
        }
      }
    },
    aggregations: {
      "my-agg-name": {
        terms: {
          field: 'my-field'
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  query: {
    range: {
      "@timestamp": {
        gte: "now-1d/d",
        lt: "now/d",
      },
    },
  },
  aggs: {
    "my-agg-name": {
      terms: {
        field: "my-field",
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}

仅返回聚合结果

编辑

默认情况下，包含聚合的搜索会同时返回搜索命中结果和聚合结果。要仅返回聚合结果，请将 size 设置为 0

resp = client.search(
    index="my-index-000001",
    size=0,
    aggs={
        "my-agg-name": {
            "terms": {
                "field": "my-field"
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  body: {
    size: 0,
    aggregations: {
      "my-agg-name": {
        terms: {
          field: 'my-field'
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  size: 0,
  aggs: {
    "my-agg-name": {
      terms: {
        field: "my-field",
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search
{
  "size": 0,
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}

运行多个聚合

编辑

您可以在同一请求中指定多个聚合

resp = client.search(
    index="my-index-000001",
    aggs={
        "my-first-agg-name": {
            "terms": {
                "field": "my-field"
            }
        },
        "my-second-agg-name": {
            "avg": {
                "field": "my-other-field"
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  body: {
    aggregations: {
      "my-first-agg-name": {
        terms: {
          field: 'my-field'
        }
      },
      "my-second-agg-name": {
        avg: {
          field: 'my-other-field'
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  aggs: {
    "my-first-agg-name": {
      terms: {
        field: "my-field",
      },
    },
    "my-second-agg-name": {
      avg: {
        field: "my-other-field",
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search
{
  "aggs": {
    "my-first-agg-name": {
      "terms": {
        "field": "my-field"
      }
    },
    "my-second-agg-name": {
      "avg": {
        "field": "my-other-field"
      }
    }
  }
}

运行子聚合

编辑

桶聚合支持桶或指标子聚合。例如，带有 avg 子聚合的词项聚合计算每个文档桶的平均值。嵌套子聚合的级别或深度没有限制。

resp = client.search(
    index="my-index-000001",
    aggs={
        "my-agg-name": {
            "terms": {
                "field": "my-field"
            },
            "aggs": {
                "my-sub-agg-name": {
                    "avg": {
                        "field": "my-other-field"
                    }
                }
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  body: {
    aggregations: {
      "my-agg-name": {
        terms: {
          field: 'my-field'
        },
        aggregations: {
          "my-sub-agg-name": {
            avg: {
              field: 'my-other-field'
            }
          }
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  aggs: {
    "my-agg-name": {
      terms: {
        field: "my-field",
      },
      aggs: {
        "my-sub-agg-name": {
          avg: {
            field: "my-other-field",
          },
        },
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      },
      "aggs": {
        "my-sub-agg-name": {
          "avg": {
            "field": "my-other-field"
          }
        }
      }
    }
  }
}

响应将子聚合结果嵌套在其父聚合下

{
  ...
  "aggregations": {
    "my-agg-name": {                           
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "foo",
          "doc_count": 5,
          "my-sub-agg-name": {                 
            "value": 75.0
          }
        }
      ]
    }
  }
}

	父聚合 `my-agg-name` 的结果。
	`my-agg-name` 的子聚合 `my-sub-agg-name` 的结果。

添加自定义元数据

编辑

使用 meta 对象将自定义元数据与聚合关联

resp = client.search(
    index="my-index-000001",
    aggs={
        "my-agg-name": {
            "terms": {
                "field": "my-field"
            },
            "meta": {
                "my-metadata-field": "foo"
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  body: {
    aggregations: {
      "my-agg-name": {
        terms: {
          field: 'my-field'
        },
        meta: {
          "my-metadata-field": 'foo'
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  aggs: {
    "my-agg-name": {
      terms: {
        field: "my-field",
      },
      meta: {
        "my-metadata-field": "foo",
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      },
      "meta": {
        "my-metadata-field": "foo"
      }
    }
  }
}

响应将 meta 对象原样返回

{
  ...
  "aggregations": {
    "my-agg-name": {
      "meta": {
        "my-metadata-field": "foo"
      },
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

返回聚合类型

编辑

默认情况下，聚合结果包含聚合的名称，但不包含其类型。要返回聚合类型，请使用 typed_keys 查询参数。

resp = client.search(
    index="my-index-000001",
    typed_keys=True,
    aggs={
        "my-agg-name": {
            "histogram": {
                "field": "my-field",
                "interval": 1000
            }
        }
    },
)
print(resp)

response = client.search(
  index: 'my-index-000001',
  typed_keys: true,
  body: {
    aggregations: {
      "my-agg-name": {
        histogram: {
          field: 'my-field',
          interval: 1000
        }
      }
    }
  }
)
puts response

const response = await client.search({
  index: "my-index-000001",
  typed_keys: "true",
  aggs: {
    "my-agg-name": {
      histogram: {
        field: "my-field",
        interval: 1000,
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search?typed_keys
{
  "aggs": {
    "my-agg-name": {
      "histogram": {
        "field": "my-field",
        "interval": 1000
      }
    }
  }
}

响应会将聚合类型作为聚合名称的前缀返回。

某些聚合返回的聚合类型与请求中的类型不同。例如，词项、重要词项和百分位数聚合会根据聚合字段的数据类型返回不同的聚合类型。

{
  ...
  "aggregations": {
    "histogram#my-agg-name": {                 
      "buckets": []
    }
  }
}

聚合类型 histogram，后跟一个 # 分隔符和聚合的名称 my-agg-name。

在聚合中使用脚本

编辑

当字段与您需要的聚合不完全匹配时，您应该在运行时字段上进行聚合

resp = client.search(
    index="my-index-000001",
    size="0",
    runtime_mappings={
        "message.length": {
            "type": "long",
            "script": "emit(doc['message.keyword'].value.length())"
        }
    },
    aggs={
        "message_length": {
            "histogram": {
                "interval": 10,
                "field": "message.length"
            }
        }
    },
)
print(resp)

const response = await client.search({
  index: "my-index-000001",
  size: 0,
  runtime_mappings: {
    "message.length": {
      type: "long",
      script: "emit(doc['message.keyword'].value.length())",
    },
  },
  aggs: {
    message_length: {
      histogram: {
        interval: 10,
        field: "message.length",
      },
    },
  },
});
console.log(response);

GET /my-index-000001/_search?size=0
{
  "runtime_mappings": {
    "message.length": {
      "type": "long",
      "script": "emit(doc['message.keyword'].value.length())"
    }
  },
  "aggs": {
    "message_length": {
      "histogram": {
        "interval": 10,
        "field": "message.length"
      }
    }
  }
}

脚本动态计算字段值，这会给聚合增加一些开销。除了计算所花费的时间外，某些聚合（如 terms 和 filters）无法将其某些优化与运行时字段一起使用。总的来说，使用运行时字段的性能成本因聚合而异。

聚合缓存

编辑

为了获得更快的响应，Elasticsearch 会将频繁运行的聚合结果缓存在分片请求缓存中。要获取缓存的结果，请对每次搜索使用相同的 preference 字符串。如果不需要搜索命中结果，请将 size 设置为 0，以避免填充缓存。

Elasticsearch 会将具有相同首选项字符串的搜索路由到相同的分片。如果分片的数据在搜索之间没有更改，则分片会返回缓存的聚合结果。

`long` 值的限制

编辑

运行聚合时，Elasticsearch 使用 double 值来保存和表示数字数据。因此，对大于 2⁵³ 的 long 数字进行的聚合是近似的。

« 正则表达式语法桶聚合 »

聚合

聚合