桶计数 K-S 检验相关性聚合
编辑桶计数 K-S 检验相关性聚合
编辑一个同级管道聚合,它针对提供的分布以及配置的同级聚合中隐含的文档计数分布执行双样本 Kolmogorov-Smirnov 检验(以下简称 “K-S 检验”)。具体来说,对于某些度量,假设该度量的百分位间隔是预先已知的或者由聚合计算出来的,那么可以使用范围聚合作为同级聚合来计算度量分布与该度量限制为文档子集的分布之间的 p 值。一个典型的用例是,如果同级聚合的范围聚合嵌套在词项聚合中,在这种情况下,可以将度量的整体分布与它对每个词项的限制进行比较。
参数
编辑-
buckets_path
- (必需,字符串)包含要关联的一组值的桶的路径。必须是
_count
路径。有关语法,请参阅buckets_path
语法。 -
alternative
- (可选,列表)一个字符串值列表,指示要计算的 K-S 检验备择假设。有效值为:“greater”、“less”、“two_sided”。此参数是确定计算 K-S 检验时使用的 K-S 统计量的关键。默认值是所有可能的备择假设。
-
fractions
- (可选,列表)一个双精度值列表,指示用于与
buckets_path
结果进行比较的样本分布。在典型用法中,这是每个桶中文档的总体比例,它与同级聚合计数中每个桶的实际文档比例进行比较。默认情况下,假设整体文档在这些桶上均匀分布,如果使用度量的相等百分位数来定义桶的端点,那么文档应该是均匀分布的。 -
sampling_method
- (可选,字符串)指示计算 K-S 检验时的采样方法。请注意,这是对返回值的采样。它确定用于比较两个样本的累积分布函数 (CDF) 点。默认值为
upper_tail
,它强调 CDF 点的上端。有效选项为:upper_tail
、uniform
和lower_tail
。
语法
编辑一个 bucket_count_ks_test
聚合的独立形式如下所示
示例
编辑以下代码段在字段 version
中的各个词项上针对均匀分布运行 bucket_count_ks_test
。均匀分布反映了 latency
百分位桶。未显示 latency
指示值的预先计算,它是利用 百分位数 聚合完成的。
此示例仅使用 latency
的十分位数。
resp = client.search( index="correlate_latency", size="0", filter_path="aggregations", aggs={ "buckets": { "terms": { "field": "version", "size": 2 }, "aggs": { "latency_ranges": { "range": { "field": "latency", "ranges": [ { "to": 0 }, { "from": 0, "to": 105 }, { "from": 105, "to": 225 }, { "from": 225, "to": 445 }, { "from": 445, "to": 665 }, { "from": 665, "to": 885 }, { "from": 885, "to": 1115 }, { "from": 1115, "to": 1335 }, { "from": 1335, "to": 1555 }, { "from": 1555, "to": 1775 }, { "from": 1775 } ] } }, "ks_test": { "bucket_count_ks_test": { "buckets_path": "latency_ranges>_count", "alternative": [ "less", "greater", "two_sided" ] } } } } }, ) print(resp)
const response = await client.search({ index: "correlate_latency", size: 0, filter_path: "aggregations", aggs: { buckets: { terms: { field: "version", size: 2, }, aggs: { latency_ranges: { range: { field: "latency", ranges: [ { to: 0, }, { from: 0, to: 105, }, { from: 105, to: 225, }, { from: 225, to: 445, }, { from: 445, to: 665, }, { from: 665, to: 885, }, { from: 885, to: 1115, }, { from: 1115, to: 1335, }, { from: 1335, to: 1555, }, { from: 1555, to: 1775, }, { from: 1775, }, ], }, }, ks_test: { bucket_count_ks_test: { buckets_path: "latency_ranges>_count", alternative: ["less", "greater", "two_sided"], }, }, }, }, }, }); console.log(response);
POST correlate_latency/_search?size=0&filter_path=aggregations { "aggs": { "buckets": { "terms": { "field": "version", "size": 2 }, "aggs": { "latency_ranges": { "range": { "field": "latency", "ranges": [ { "to": 0 }, { "from": 0, "to": 105 }, { "from": 105, "to": 225 }, { "from": 225, "to": 445 }, { "from": 445, "to": 665 }, { "from": 665, "to": 885 }, { "from": 885, "to": 1115 }, { "from": 1115, "to": 1335 }, { "from": 1335, "to": 1555 }, { "from": 1555, "to": 1775 }, { "from": 1775 } ] } }, "ks_test": { "bucket_count_ks_test": { "buckets_path": "latency_ranges>_count", "alternative": ["less", "greater", "two_sided"] } } } } } }
包含范围聚合和桶相关性聚合的词项桶。两者都用于计算词项值与延迟的相关性。 |
|
延迟字段上的范围聚合。范围是参照延迟字段的百分位数创建的。 |
|
桶计数 K-S 检验聚合,用于测试桶计数是否来自与 |
以下可能是响应
{ "aggregations" : { "buckets" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "1.0", "doc_count" : 100, "latency_ranges" : { "buckets" : [ { "key" : "*-0.0", "to" : 0.0, "doc_count" : 0 }, { "key" : "0.0-105.0", "from" : 0.0, "to" : 105.0, "doc_count" : 1 }, { "key" : "105.0-225.0", "from" : 105.0, "to" : 225.0, "doc_count" : 9 }, { "key" : "225.0-445.0", "from" : 225.0, "to" : 445.0, "doc_count" : 0 }, { "key" : "445.0-665.0", "from" : 445.0, "to" : 665.0, "doc_count" : 0 }, { "key" : "665.0-885.0", "from" : 665.0, "to" : 885.0, "doc_count" : 0 }, { "key" : "885.0-1115.0", "from" : 885.0, "to" : 1115.0, "doc_count" : 10 }, { "key" : "1115.0-1335.0", "from" : 1115.0, "to" : 1335.0, "doc_count" : 20 }, { "key" : "1335.0-1555.0", "from" : 1335.0, "to" : 1555.0, "doc_count" : 20 }, { "key" : "1555.0-1775.0", "from" : 1555.0, "to" : 1775.0, "doc_count" : 20 }, { "key" : "1775.0-*", "from" : 1775.0, "doc_count" : 20 } ] }, "ks_test" : { "less" : 2.248673241788478E-4, "greater" : 1.0, "two_sided" : 5.791639181800257E-4 } }, { "key" : "2.0", "doc_count" : 100, "latency_ranges" : { "buckets" : [ { "key" : "*-0.0", "to" : 0.0, "doc_count" : 0 }, { "key" : "0.0-105.0", "from" : 0.0, "to" : 105.0, "doc_count" : 19 }, { "key" : "105.0-225.0", "from" : 105.0, "to" : 225.0, "doc_count" : 11 }, { "key" : "225.0-445.0", "from" : 225.0, "to" : 445.0, "doc_count" : 20 }, { "key" : "445.0-665.0", "from" : 445.0, "to" : 665.0, "doc_count" : 20 }, { "key" : "665.0-885.0", "from" : 665.0, "to" : 885.0, "doc_count" : 20 }, { "key" : "885.0-1115.0", "from" : 885.0, "to" : 1115.0, "doc_count" : 10 }, { "key" : "1115.0-1335.0", "from" : 1115.0, "to" : 1335.0, "doc_count" : 0 }, { "key" : "1335.0-1555.0", "from" : 1335.0, "to" : 1555.0, "doc_count" : 0 }, { "key" : "1555.0-1775.0", "from" : 1555.0, "to" : 1775.0, "doc_count" : 0 }, { "key" : "1775.0-*", "from" : 1775.0, "doc_count" : 0 } ] }, "ks_test" : { "less" : 0.9642895789647244, "greater" : 4.58718174664754E-9, "two_sided" : 5.916656831139733E-9 } } ] } } }