› › ›

直方图字段类型

编辑

直方图字段类型

编辑

一个用于存储表示直方图的预聚合数值数据的字段。此数据使用两个配对的数组定义

一个 values 数组，包含 double 数字，表示直方图的桶。这些值必须按升序提供。
一个对应的 counts 数组，包含 long 数字，表示每个桶中落入的值的数量。这些数字必须为正数或零。

因为 values 数组中的元素对应于 count 数组中相同位置的元素，所以这两个数组的长度必须相同。

每个文档的 histogram 字段只能存储一对 values 和 count 数组。不支持嵌套数组。
histogram 字段不支持排序。

用途

编辑

histogram 字段主要用于聚合。为了更容易地进行聚合，histogram 字段数据存储为二进制文档值，而不是索引。它的大小（以字节为单位）最多为 13 * numValues，其中 numValues 是提供的数组的长度。

由于数据未被索引，因此您只能将 histogram 字段用于以下聚合和查询

min 聚合
max 聚合
sum 聚合
value_count 聚合
avg 聚合
percentiles 聚合
percentile ranks 聚合
boxplot 聚合
histogram 聚合
range 聚合
exists 查询

构建直方图

编辑

当使用直方图作为聚合的一部分时，结果的准确性将取决于直方图的构建方式。重要的是要考虑将用于构建它的百分位数聚合模式。一些可能性包括

对于 T-Digest 模式，values 数组表示平均质心位置，counts 数组表示归因于每个质心的值的数量。如果算法已经开始近似百分位数，则这种不准确性会延续到直方图中。
对于高动态范围 (HDR) 直方图模式，values 数组表示每个桶间隔的固定上限，counts 数组表示归因于每个间隔的值的数量。此实现保持固定的最坏情况百分比误差（指定为有效数字），因此在生成直方图时使用的值将是您在聚合时可以达到的最大精度。

直方图字段是“算法无关的”，不存储特定于 T-Digest 或 HDRHistogram 的数据。虽然这意味着该字段在技术上可以使用任一算法进行聚合，但在实践中，用户应选择一种算法并以该方式索引数据（例如，T-Digest 的质心或 HDRHistogram 的间隔）以确保最佳精度。

合成 `_source`

编辑

合成 _source 仅对 TSDB 索引（将 index.mode 设置为 time_series 的索引）普遍可用。对于其他索引，合成 _source 处于技术预览状态。技术预览中的功能可能会在未来的版本中更改或删除。Elastic 将努力解决任何问题，但技术预览中的功能不受官方 GA 功能的支持 SLA 的约束。

histogram 字段在其默认配置中支持合成 _source。

为了节省空间，零计数桶不会存储在直方图文档值中。因此，当在启用合成源的索引中索引直方图字段时，索引包含零计数桶的直方图会导致在取回直方图时缺少桶。

示例

编辑

以下创建索引 API 请求创建了一个包含两个字段映射的新索引

my_histogram，一个用于存储百分位数数据的 histogram 字段
my_text，一个用于存储直方图标题的 keyword 字段

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "my_histogram": {
                "type": "histogram"
            },
            "my_text": {
                "type": "keyword"
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_histogram: {
          type: 'histogram'
        },
        my_text: {
          type: 'keyword'
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      my_histogram: {
        type: "histogram",
      },
      my_text: {
        type: "keyword",
      },
    },
  },
});
console.log(response);

PUT my-index-000001
{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

以下索引 API 请求为两个直方图存储了预聚合数据：histogram_1 和 histogram_2。

resp = client.index(
    index="my-index-000001",
    id="1",
    document={
        "my_text": "histogram_1",
        "my_histogram": {
            "values": [
                0.1,
                0.2,
                0.3,
                0.4,
                0.5
            ],
            "counts": [
                3,
                7,
                23,
                12,
                6
            ]
        }
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="2",
    document={
        "my_text": "histogram_2",
        "my_histogram": {
            "values": [
                0.1,
                0.25,
                0.35,
                0.4,
                0.45,
                0.5
            ],
            "counts": [
                8,
                17,
                8,
                7,
                6,
                2
            ]
        }
    },
)
print(resp1)

response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
    my_text: 'histogram_1',
    my_histogram: {
      values: [
        0.1,
        0.2,
        0.3,
        0.4,
        0.5
      ],
      counts: [
        3,
        7,
        23,
        12,
        6
      ]
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 2,
  body: {
    my_text: 'histogram_2',
    my_histogram: {
      values: [
        0.1,
        0.25,
        0.35,
        0.4,
        0.45,
        0.5
      ],
      counts: [
        8,
        17,
        8,
        7,
        6,
        2
      ]
    }
  }
)
puts response

const response = await client.index({
  index: "my-index-000001",
  id: 1,
  document: {
    my_text: "histogram_1",
    my_histogram: {
      values: [0.1, 0.2, 0.3, 0.4, 0.5],
      counts: [3, 7, 23, 12, 6],
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 2,
  document: {
    my_text: "histogram_2",
    my_histogram: {
      values: [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
      counts: [8, 17, 8, 7, 6, 2],
    },
  },
});
console.log(response1);

PUT my-index-000001/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}

PUT my-index-000001/_doc/2
{
  "my_text" : "histogram_2",
  "my_histogram" : {
      "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], 
      "counts" : [8, 17, 8, 7, 6, 2] 
   }
}

	每个桶的值。数组中的值被视为双精度值，并且必须按递增顺序给出。对于 T-Digest 直方图，此值表示平均值。对于 HDR 直方图，这表示迭代到的值。
	每个桶的计数。数组中的值被视为长整型值，并且必须为正数或零。负值将被拒绝。桶和计数之间的关系由数组中的位置给出。

« 地理形状字段类型 IP 字段类型 »

直方图字段类型

直方图字段类型

用途

构建直方图

合成 _source

示例

合成 `_source`