eager_global_ordinals

`eager_global_ordinals`

什么是全局序数？

为了支持聚合和其他需要按文档查找字段值的运算，Elasticsearch 使用了一种称为doc values的数据结构。keyword 等基于术语的字段类型使用序数映射来存储它们的 doc values，以实现更紧凑的表示。此映射通过根据词法顺序为每个术语分配递增整数或序数来工作。字段的 doc values 只存储每个文档的序数，而不是原始术语，并使用单独的查找结构在序数和术语之间进行转换。

在聚合期间使用序数可以大大提高性能。例如，terms 聚合仅依赖于序数在分片级别将文档收集到桶中，然后在跨分片组合结果时将序数转换回其原始术语值。

每个索引段都定义自己的序数映射，但聚合会收集整个分片的数据。因此，为了能够将序数用于分片级别的操作（如聚合），Elasticsearch 创建了一个名为全局序数的统一映射。全局序数映射构建在段序数之上，其工作原理是维护一个从全局序数到每个段的局部序数的映射。

如果搜索包含以下任何组件，则使用全局序数

对keyword、ip 和flattened 字段进行的某些桶聚合。这包括如上所述的terms 聚合，以及composite、diversified_sampler 和significant_terms。
对需要启用fielddata 的text 字段进行的桶聚合。
对来自join 字段的父文档和子文档的操作，包括has_child 查询和parent 聚合。

全局序数映射使用堆内存作为字段数据缓存的一部分。高基数字段的聚合可能会使用大量内存并触发字段数据断路器。

加载全局序数

编辑

必须在搜索期间使用序数之前构建全局序数映射。默认情况下，映射在首次需要全局序数时在搜索期间加载。如果您正在优化索引速度，这是正确的方法，但如果搜索性能是优先考虑的，建议在将在聚合中使用的字段上急切加载全局序数。

resp = client.indices.put_mapping(
    index="my-index-000001",
    properties={
        "tags": {
            "type": "keyword",
            "eager_global_ordinals": True
        }
    },
)
print(resp)

response = client.indices.put_mapping(
  index: 'my-index-000001',
  body: {
    properties: {
      tags: {
        type: 'keyword',
        eager_global_ordinals: true
      }
    }
  }
)
puts response

const response = await client.indices.putMapping({
  index: "my-index-000001",
  properties: {
    tags: {
      type: "keyword",
      eager_global_ordinals: true,
    },
  },
});
console.log(response);

PUT my-index-000001/_mapping
{
  "properties": {
    "tags": {
      "type": "keyword",
      "eager_global_ordinals": true
    }
  }
}

启用eager_global_ordinals 时，将在分片刷新时构建全局序数——Elasticsearch 始终在将更改暴露给索引内容之前加载它们。这将构建全局序数的成本从搜索转移到索引时间。Elasticsearch 还将在创建分片的新的副本时急切地构建全局序数，例如在增加副本数量或将分片重新定位到新节点时。

可以通过更新eager_global_ordinals 设置随时禁用急切加载。

resp = client.indices.put_mapping(
    index="my-index-000001",
    properties={
        "tags": {
            "type": "keyword",
            "eager_global_ordinals": False
        }
    },
)
print(resp)

response = client.indices.put_mapping(
  index: 'my-index-000001',
  body: {
    properties: {
      tags: {
        type: 'keyword',
        eager_global_ordinals: false
      }
    }
  }
)
puts response

const response = await client.indices.putMapping({
  index: "my-index-000001",
  properties: {
    tags: {
      type: "keyword",
      eager_global_ordinals: false,
    },
  },
});
console.log(response);

PUT my-index-000001/_mapping
{
  "properties": {
    "tags": {
      "type": "keyword",
      "eager_global_ordinals": false
    }
  }
}

避免加载全局序数

编辑

通常，全局序数在加载时间和内存使用方面不会带来很大的开销。但是，在具有大型分片的索引上，或者如果字段包含大量唯一术语值，则加载全局序数可能会很昂贵。因为全局序数为分片上的所有段提供统一映射，所以在新的段可见时，也需要完全重建它们。

在某些情况下，可以完全避免加载全局序数。

terms、sampler 和significant_terms 聚合支持一个参数execution_hint，有助于控制桶的收集方式。它默认为global_ordinals，但可以设置为map 以直接使用术语值。
如果分片已强制合并为单个段，则其段序数对于分片来说已经是全局的。在这种情况下，Elasticsearch 不需要构建全局序数映射，并且使用全局序数不会产生额外的开销。请注意，出于性能原因，您应该只强制合并您将不再写入的索引。

« dynamic enabled »