桶计数 K-S 检验相关性聚合

编辑

桶计数 K-S 检验相关性聚合

编辑

一个同级管道聚合,它针对提供的分布以及配置的同级聚合中隐含的文档计数分布执行双样本 Kolmogorov-Smirnov 检验(以下简称 “K-S 检验”)。具体来说,对于某些度量,假设该度量的百分位间隔是预先已知的或者由聚合计算出来的,那么可以使用范围聚合作为同级聚合来计算度量分布与该度量限制为文档子集的分布之间的 p 值。一个典型的用例是,如果同级聚合的范围聚合嵌套在词项聚合中,在这种情况下,可以将度量的整体分布与它对每个词项的限制进行比较。

参数

编辑
buckets_path
(必需,字符串)包含要关联的一组值的桶的路径。必须是 _count 路径。有关语法,请参阅 buckets_path 语法
alternative
(可选,列表)一个字符串值列表,指示要计算的 K-S 检验备择假设。有效值为:“greater”、“less”、“two_sided”。此参数是确定计算 K-S 检验时使用的 K-S 统计量的关键。默认值是所有可能的备择假设。
fractions
(可选,列表)一个双精度值列表,指示用于与 buckets_path 结果进行比较的样本分布。在典型用法中,这是每个桶中文档的总体比例,它与同级聚合计数中每个桶的实际文档比例进行比较。默认情况下,假设整体文档在这些桶上均匀分布,如果使用度量的相等百分位数来定义桶的端点,那么文档应该是均匀分布的。
sampling_method
(可选,字符串)指示计算 K-S 检验时的采样方法。请注意,这是对返回值的采样。它确定用于比较两个样本的累积分布函数 (CDF) 点。默认值为 upper_tail,它强调 CDF 点的上端。有效选项为:upper_tailuniformlower_tail

语法

编辑

一个 bucket_count_ks_test 聚合的独立形式如下所示

{
  "bucket_count_ks_test": {
    "buckets_path": "range_values>_count", 
    "alternative": ["less", "greater", "two_sided"], 
    "sampling_method": "upper_tail" 
  }
}

包含要测试的值的桶。

要计算的备择假设。

K-S 统计量的采样方法。

示例

编辑

以下代码段在字段 version 中的各个词项上针对均匀分布运行 bucket_count_ks_test。均匀分布反映了 latency 百分位桶。未显示 latency 指示值的预先计算,它是利用 百分位数 聚合完成的。

此示例仅使用 latency 的十分位数。

resp = client.search(
    index="correlate_latency",
    size="0",
    filter_path="aggregations",
    aggs={
        "buckets": {
            "terms": {
                "field": "version",
                "size": 2
            },
            "aggs": {
                "latency_ranges": {
                    "range": {
                        "field": "latency",
                        "ranges": [
                            {
                                "to": 0
                            },
                            {
                                "from": 0,
                                "to": 105
                            },
                            {
                                "from": 105,
                                "to": 225
                            },
                            {
                                "from": 225,
                                "to": 445
                            },
                            {
                                "from": 445,
                                "to": 665
                            },
                            {
                                "from": 665,
                                "to": 885
                            },
                            {
                                "from": 885,
                                "to": 1115
                            },
                            {
                                "from": 1115,
                                "to": 1335
                            },
                            {
                                "from": 1335,
                                "to": 1555
                            },
                            {
                                "from": 1555,
                                "to": 1775
                            },
                            {
                                "from": 1775
                            }
                        ]
                    }
                },
                "ks_test": {
                    "bucket_count_ks_test": {
                        "buckets_path": "latency_ranges>_count",
                        "alternative": [
                            "less",
                            "greater",
                            "two_sided"
                        ]
                    }
                }
            }
        }
    },
)
print(resp)
const response = await client.search({
  index: "correlate_latency",
  size: 0,
  filter_path: "aggregations",
  aggs: {
    buckets: {
      terms: {
        field: "version",
        size: 2,
      },
      aggs: {
        latency_ranges: {
          range: {
            field: "latency",
            ranges: [
              {
                to: 0,
              },
              {
                from: 0,
                to: 105,
              },
              {
                from: 105,
                to: 225,
              },
              {
                from: 225,
                to: 445,
              },
              {
                from: 445,
                to: 665,
              },
              {
                from: 665,
                to: 885,
              },
              {
                from: 885,
                to: 1115,
              },
              {
                from: 1115,
                to: 1335,
              },
              {
                from: 1335,
                to: 1555,
              },
              {
                from: 1555,
                to: 1775,
              },
              {
                from: 1775,
              },
            ],
          },
        },
        ks_test: {
          bucket_count_ks_test: {
            buckets_path: "latency_ranges>_count",
            alternative: ["less", "greater", "two_sided"],
          },
        },
      },
    },
  },
});
console.log(response);
POST correlate_latency/_search?size=0&filter_path=aggregations
{
  "aggs": {
    "buckets": {
      "terms": { 
        "field": "version",
        "size": 2
      },
      "aggs": {
        "latency_ranges": {
          "range": { 
            "field": "latency",
            "ranges": [
              { "to": 0 },
              { "from": 0, "to": 105 },
              { "from": 105, "to": 225 },
              { "from": 225, "to": 445 },
              { "from": 445, "to": 665 },
              { "from": 665, "to": 885 },
              { "from": 885, "to": 1115 },
              { "from": 1115, "to": 1335 },
              { "from": 1335, "to": 1555 },
              { "from": 1555, "to": 1775 },
              { "from": 1775 }
            ]
          }
        },
        "ks_test": { 
          "bucket_count_ks_test": {
            "buckets_path": "latency_ranges>_count",
            "alternative": ["less", "greater", "two_sided"]
          }
        }
      }
    }
  }
}

包含范围聚合和桶相关性聚合的词项桶。两者都用于计算词项值与延迟的相关性。

延迟字段上的范围聚合。范围是参照延迟字段的百分位数创建的。

桶计数 K-S 检验聚合,用于测试桶计数是否来自与 fractions 相同的分布;其中 fractions 是均匀分布。

以下可能是响应

{
  "aggregations" : {
    "buckets" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1.0",
          "doc_count" : 100,
          "latency_ranges" : {
            "buckets" : [
              {
                "key" : "*-0.0",
                "to" : 0.0,
                "doc_count" : 0
              },
              {
                "key" : "0.0-105.0",
                "from" : 0.0,
                "to" : 105.0,
                "doc_count" : 1
              },
              {
                "key" : "105.0-225.0",
                "from" : 105.0,
                "to" : 225.0,
                "doc_count" : 9
              },
              {
                "key" : "225.0-445.0",
                "from" : 225.0,
                "to" : 445.0,
                "doc_count" : 0
              },
              {
                "key" : "445.0-665.0",
                "from" : 445.0,
                "to" : 665.0,
                "doc_count" : 0
              },
              {
                "key" : "665.0-885.0",
                "from" : 665.0,
                "to" : 885.0,
                "doc_count" : 0
              },
              {
                "key" : "885.0-1115.0",
                "from" : 885.0,
                "to" : 1115.0,
                "doc_count" : 10
              },
              {
                "key" : "1115.0-1335.0",
                "from" : 1115.0,
                "to" : 1335.0,
                "doc_count" : 20
              },
              {
                "key" : "1335.0-1555.0",
                "from" : 1335.0,
                "to" : 1555.0,
                "doc_count" : 20
              },
              {
                "key" : "1555.0-1775.0",
                "from" : 1555.0,
                "to" : 1775.0,
                "doc_count" : 20
              },
              {
                "key" : "1775.0-*",
                "from" : 1775.0,
                "doc_count" : 20
              }
            ]
          },
          "ks_test" : {
            "less" : 2.248673241788478E-4,
            "greater" : 1.0,
            "two_sided" : 5.791639181800257E-4
          }
        },
        {
          "key" : "2.0",
          "doc_count" : 100,
          "latency_ranges" : {
            "buckets" : [
              {
                "key" : "*-0.0",
                "to" : 0.0,
                "doc_count" : 0
              },
              {
                "key" : "0.0-105.0",
                "from" : 0.0,
                "to" : 105.0,
                "doc_count" : 19
              },
              {
                "key" : "105.0-225.0",
                "from" : 105.0,
                "to" : 225.0,
                "doc_count" : 11
              },
              {
                "key" : "225.0-445.0",
                "from" : 225.0,
                "to" : 445.0,
                "doc_count" : 20
              },
              {
                "key" : "445.0-665.0",
                "from" : 445.0,
                "to" : 665.0,
                "doc_count" : 20
              },
              {
                "key" : "665.0-885.0",
                "from" : 665.0,
                "to" : 885.0,
                "doc_count" : 20
              },
              {
                "key" : "885.0-1115.0",
                "from" : 885.0,
                "to" : 1115.0,
                "doc_count" : 10
              },
              {
                "key" : "1115.0-1335.0",
                "from" : 1115.0,
                "to" : 1335.0,
                "doc_count" : 0
              },
              {
                "key" : "1335.0-1555.0",
                "from" : 1335.0,
                "to" : 1555.0,
                "doc_count" : 0
              },
              {
                "key" : "1555.0-1775.0",
                "from" : 1555.0,
                "to" : 1775.0,
                "doc_count" : 0
              },
              {
                "key" : "1775.0-*",
                "from" : 1775.0,
                "doc_count" : 0
              }
            ]
          },
          "ks_test" : {
            "less" : 0.9642895789647244,
            "greater" : 4.58718174664754E-9,
            "two_sided" : 5.916656831139733E-9
          }
        }
      ]
    }
  }
}