索引排序

编辑

在 Elasticsearch 中创建新索引时,可以配置每个分片内的段如何排序。默认情况下,Lucene 不应用任何排序。index.sort.* 设置定义了在每个段内对文档进行排序时应使用的字段。

允许对具有嵌套对象的映射应用索引排序,只要 index.sort.* 设置不包含任何嵌套字段即可。

例如,以下示例展示了如何定义单个字段的排序

resp = client.indices.create(
    index="my-index-000001",
    settings={
        "index": {
            "sort.field": "date",
            "sort.order": "desc"
        }
    },
    mappings={
        "properties": {
            "date": {
                "type": "date"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    settings: {
      index: {
        'sort.field' => 'date',
        'sort.order' => 'desc'
      }
    },
    mappings: {
      properties: {
        date: {
          type: 'date'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
    index: {
      "sort.field": "date",
      "sort.order": "desc",
    },
  },
  mappings: {
    properties: {
      date: {
        type: "date",
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "settings": {
    "index": {
      "sort.field": "date", 
      "sort.order": "desc"  
    }
  },
  "mappings": {
    "properties": {
      "date": {
        "type": "date"
      }
    }
  }
}

此索引按 date 字段排序

... 降序。

也可以按多个字段对索引进行排序

resp = client.indices.create(
    index="my-index-000001",
    settings={
        "index": {
            "sort.field": [
                "username",
                "date"
            ],
            "sort.order": [
                "asc",
                "desc"
            ]
        }
    },
    mappings={
        "properties": {
            "username": {
                "type": "keyword",
                "doc_values": True
            },
            "date": {
                "type": "date"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    settings: {
      index: {
        'sort.field' => [
          'username',
          'date'
        ],
        'sort.order' => [
          'asc',
          'desc'
        ]
      }
    },
    mappings: {
      properties: {
        username: {
          type: 'keyword',
          doc_values: true
        },
        date: {
          type: 'date'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  settings: {
    index: {
      "sort.field": ["username", "date"],
      "sort.order": ["asc", "desc"],
    },
  },
  mappings: {
    properties: {
      username: {
        type: "keyword",
        doc_values: true,
      },
      date: {
        type: "date",
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "settings": {
    "index": {
      "sort.field": [ "username", "date" ], 
      "sort.order": [ "asc", "desc" ]       
    }
  },
  "mappings": {
    "properties": {
      "username": {
        "type": "keyword",
        "doc_values": true
      },
      "date": {
        "type": "date"
      }
    }
  }
}

此索引首先按 username 排序,然后按 date 排序

... username 字段为升序,date 字段为降序。

索引排序支持以下设置

index.sort.field
用于对索引进行排序的字段列表。这里只允许使用带有 doc_valuesbooleannumericdatekeyword 字段。
index.sort.order

每个字段使用的排序顺序。order 选项可以有以下值

  • asc: 升序
  • desc: 降序。
index.sort.mode

Elasticsearch 支持按多值字段进行排序。mode 选项控制选择哪个值来对文档进行排序。mode 选项可以有以下值

  • min: 选择最小值。
  • max: 选择最大值。
index.sort.missing

missing 参数指定如何处理缺少该字段的文档。missing 值可以有以下值

  • _last: 没有该字段值的文档排在最后。
  • _first: 没有该字段值的文档排在最前。

索引排序只能在索引创建时定义一次。不允许在现有索引上添加或更新排序。索引排序在索引吞吐量方面也有成本,因为文档必须在刷新和合并时进行排序。您应该在激活此功能之前测试它对应用程序的影响。

提前终止搜索请求

编辑

默认情况下,在 Elasticsearch 中,搜索请求必须访问与查询匹配的每个文档,才能检索按指定排序排序的顶部文档。但是,当索引排序和搜索排序相同时,可以限制每个段中应访问的文档数量,以全局检索排名靠前的 N 个文档。例如,假设我们有一个包含按时间戳字段排序的事件的索引

resp = client.indices.create(
    index="events",
    settings={
        "index": {
            "sort.field": "timestamp",
            "sort.order": "desc"
        }
    },
    mappings={
        "properties": {
            "timestamp": {
                "type": "date"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'events',
  body: {
    settings: {
      index: {
        'sort.field' => 'timestamp',
        'sort.order' => 'desc'
      }
    },
    mappings: {
      properties: {
        timestamp: {
          type: 'date'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "events",
  settings: {
    index: {
      "sort.field": "timestamp",
      "sort.order": "desc",
    },
  },
  mappings: {
    properties: {
      timestamp: {
        type: "date",
      },
    },
  },
});
console.log(response);
PUT events
{
  "settings": {
    "index": {
      "sort.field": "timestamp",
      "sort.order": "desc" 
    }
  },
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      }
    }
  }
}

此索引按时间戳降序排序(最新优先)

您可以使用以下方法搜索最近的 10 个事件

resp = client.search(
    index="events",
    size=10,
    sort=[
        {
            "timestamp": "desc"
        }
    ],
)
print(resp)
response = client.search(
  index: 'events',
  body: {
    size: 10,
    sort: [
      {
        timestamp: 'desc'
      }
    ]
  }
)
puts response
const response = await client.search({
  index: "events",
  size: 10,
  sort: [
    {
      timestamp: "desc",
    },
  ],
});
console.log(response);
GET /events/_search
{
  "size": 10,
  "sort": [
    { "timestamp": "desc" }
  ]
}

Elasticsearch 将检测到每个段的顶部文档已在索引中排序,并且只会比较每个段的前 N 个文档。收集与查询匹配的其余文档,以计算结果总数并构建聚合。

如果您只查找最近的 10 个事件,并且对与查询匹配的文档总数不感兴趣,可以将 track_total_hits 设置为 false

resp = client.search(
    index="events",
    size=10,
    sort=[
        {
            "timestamp": "desc"
        }
    ],
    track_total_hits=False,
)
print(resp)
response = client.search(
  index: 'events',
  body: {
    size: 10,
    sort: [
      {
        timestamp: 'desc'
      }
    ],
    track_total_hits: false
  }
)
puts response
const response = await client.search({
  index: "events",
  size: 10,
  sort: [
    {
      timestamp: "desc",
    },
  ],
  track_total_hits: false,
});
console.log(response);
GET /events/_search
{
  "size": 10,
  "sort": [ 
      { "timestamp": "desc" }
  ],
  "track_total_hits": false
}

索引排序将用于对顶部文档进行排名,并且在每个段收集到前 10 个匹配项后,每个段将提前终止收集。

这次,Elasticsearch 将不会尝试计算文档的数量,并且能够在每个段收集到 N 个文档后立即终止查询。

{
  "_shards": ...
   "hits" : {  
      "max_score" : null,
      "hits" : []
  },
  "took": 20,
  "timed_out": false
}

由于提前终止,与查询匹配的命中总数未知。

聚合将收集与查询匹配的所有文档,而不管 track_total_hits 的值如何