桶计数 K-S 检验关联聚合
编辑桶计数 K-S 检验关联聚合
编辑一个兄弟管道聚合,它对提供的分布和配置的兄弟聚合中文档计数所隐含的分布执行双样本 Kolmogorov–Smirnov 检验(从现在起称为“K-S 检验”)。具体来说,对于某些度量,假设该度量的百分位数区间已预先知道或已由聚合计算,那么将使用范围聚合来计算兄弟聚合,以计算该度量与该度量对文档子集的限制之间的分布差异的 p 值。一个自然的用例是,如果兄弟聚合范围聚合嵌套在术语聚合中,在这种情况下,将整体度量分布与其对每个术语的限制进行比较。
参数
编辑-
buckets_path
- (必需,字符串)包含一组要关联的值的桶的路径。必须是
_count
路径有关语法,请参阅buckets_path
语法。 -
alternative
- (可选,列表)一个字符串值列表,指示要计算哪个 K-S 检验备择假设。有效值为:“greater”、“less”、“two_sided”。此参数是确定计算 K-S 检验时使用的 K-S 统计量的关键。默认值为所有可能的备择假设。
-
fractions
- (可选,列表)一个双精度数列表,指示要与
buckets_path
结果进行比较的样本分布。在典型用法中,这是每个桶中文档的总体比例,将其与来自兄弟聚合计数的每个桶中的实际文档比例进行比较。默认情况下,假设文档总体在这些桶上均匀分布,如果使用度量的相等百分位数来定义桶端点,则它们将是均匀分布的。 -
sampling_method
- (可选,字符串)指示计算 K-S 检验时的抽样方法。请注意,这是对返回的值进行抽样。这决定了用于比较两个样本的累积分布函数 (CDF) 点。默认为
upper_tail
,它强调 CDF 点的上端。有效选项为:upper_tail
、uniform
和lower_tail
。
语法
编辑一个 bucket_count_ks_test
聚合在孤立状态下如下所示
示例
编辑以下代码片段对字段 version
中的各个术语运行 bucket_count_ks_test
,以检验其是否服从均匀分布。均匀分布反映了 latency
百分位数桶。未显示 latency
指标值的预计算,它是利用 百分位数 聚合完成的。
此示例仅使用 latency
的十分位数。
resp = client.search( index="correlate_latency", size="0", filter_path="aggregations", aggs={ "buckets": { "terms": { "field": "version", "size": 2 }, "aggs": { "latency_ranges": { "range": { "field": "latency", "ranges": [ { "to": 0 }, { "from": 0, "to": 105 }, { "from": 105, "to": 225 }, { "from": 225, "to": 445 }, { "from": 445, "to": 665 }, { "from": 665, "to": 885 }, { "from": 885, "to": 1115 }, { "from": 1115, "to": 1335 }, { "from": 1335, "to": 1555 }, { "from": 1555, "to": 1775 }, { "from": 1775 } ] } }, "ks_test": { "bucket_count_ks_test": { "buckets_path": "latency_ranges>_count", "alternative": [ "less", "greater", "two_sided" ] } } } } }, ) print(resp)
const response = await client.search({ index: "correlate_latency", size: 0, filter_path: "aggregations", aggs: { buckets: { terms: { field: "version", size: 2, }, aggs: { latency_ranges: { range: { field: "latency", ranges: [ { to: 0, }, { from: 0, to: 105, }, { from: 105, to: 225, }, { from: 225, to: 445, }, { from: 445, to: 665, }, { from: 665, to: 885, }, { from: 885, to: 1115, }, { from: 1115, to: 1335, }, { from: 1335, to: 1555, }, { from: 1555, to: 1775, }, { from: 1775, }, ], }, }, ks_test: { bucket_count_ks_test: { buckets_path: "latency_ranges>_count", alternative: ["less", "greater", "two_sided"], }, }, }, }, }, }); console.log(response);
POST correlate_latency/_search?size=0&filter_path=aggregations { "aggs": { "buckets": { "terms": { "field": "version", "size": 2 }, "aggs": { "latency_ranges": { "range": { "field": "latency", "ranges": [ { "to": 0 }, { "from": 0, "to": 105 }, { "from": 105, "to": 225 }, { "from": 225, "to": 445 }, { "from": 445, "to": 665 }, { "from": 665, "to": 885 }, { "from": 885, "to": 1115 }, { "from": 1115, "to": 1335 }, { "from": 1335, "to": 1555 }, { "from": 1555, "to": 1775 }, { "from": 1775 } ] } }, "ks_test": { "bucket_count_ks_test": { "buckets_path": "latency_ranges>_count", "alternative": ["less", "greater", "two_sided"] } } } } } }
包含范围聚合和桶关联聚合的术语桶。两者都用于计算术语值与延迟的相关性。 |
|
延迟字段上的范围聚合。范围是参考延迟字段的百分位数创建的。 |
|
桶计数 K-S 检验聚合,测试桶计数是否来自与 |
以下可能是响应
{ "aggregations" : { "buckets" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "1.0", "doc_count" : 100, "latency_ranges" : { "buckets" : [ { "key" : "*-0.0", "to" : 0.0, "doc_count" : 0 }, { "key" : "0.0-105.0", "from" : 0.0, "to" : 105.0, "doc_count" : 1 }, { "key" : "105.0-225.0", "from" : 105.0, "to" : 225.0, "doc_count" : 9 }, { "key" : "225.0-445.0", "from" : 225.0, "to" : 445.0, "doc_count" : 0 }, { "key" : "445.0-665.0", "from" : 445.0, "to" : 665.0, "doc_count" : 0 }, { "key" : "665.0-885.0", "from" : 665.0, "to" : 885.0, "doc_count" : 0 }, { "key" : "885.0-1115.0", "from" : 885.0, "to" : 1115.0, "doc_count" : 10 }, { "key" : "1115.0-1335.0", "from" : 1115.0, "to" : 1335.0, "doc_count" : 20 }, { "key" : "1335.0-1555.0", "from" : 1335.0, "to" : 1555.0, "doc_count" : 20 }, { "key" : "1555.0-1775.0", "from" : 1555.0, "to" : 1775.0, "doc_count" : 20 }, { "key" : "1775.0-*", "from" : 1775.0, "doc_count" : 20 } ] }, "ks_test" : { "less" : 2.248673241788478E-4, "greater" : 1.0, "two_sided" : 5.791639181800257E-4 } }, { "key" : "2.0", "doc_count" : 100, "latency_ranges" : { "buckets" : [ { "key" : "*-0.0", "to" : 0.0, "doc_count" : 0 }, { "key" : "0.0-105.0", "from" : 0.0, "to" : 105.0, "doc_count" : 19 }, { "key" : "105.0-225.0", "from" : 105.0, "to" : 225.0, "doc_count" : 11 }, { "key" : "225.0-445.0", "from" : 225.0, "to" : 445.0, "doc_count" : 20 }, { "key" : "445.0-665.0", "from" : 445.0, "to" : 665.0, "doc_count" : 20 }, { "key" : "665.0-885.0", "from" : 665.0, "to" : 885.0, "doc_count" : 20 }, { "key" : "885.0-1115.0", "from" : 885.0, "to" : 1115.0, "doc_count" : 10 }, { "key" : "1115.0-1335.0", "from" : 1115.0, "to" : 1335.0, "doc_count" : 0 }, { "key" : "1335.0-1555.0", "from" : 1335.0, "to" : 1555.0, "doc_count" : 0 }, { "key" : "1555.0-1775.0", "from" : 1555.0, "to" : 1775.0, "doc_count" : 0 }, { "key" : "1775.0-*", "from" : 1775.0, "doc_count" : 0 } ] }, "ks_test" : { "less" : 0.9642895789647244, "greater" : 4.58718174664754E-9, "two_sided" : 5.916656831139733E-9 } } ] } } }