索引排序
编辑索引排序
编辑在 Elasticsearch 中创建新索引时,可以配置每个分片内的段如何排序。默认情况下,Lucene 不应用任何排序。index.sort.*
设置定义了在每个段内对文档进行排序时应使用的字段。
允许对具有嵌套对象的映射应用索引排序,只要 index.sort.*
设置不包含任何嵌套字段即可。
例如,以下示例展示了如何定义单个字段的排序
resp = client.indices.create( index="my-index-000001", settings={ "index": { "sort.field": "date", "sort.order": "desc" } }, mappings={ "properties": { "date": { "type": "date" } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { index: { 'sort.field' => 'date', 'sort.order' => 'desc' } }, mappings: { properties: { date: { type: 'date' } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { index: { "sort.field": "date", "sort.order": "desc", }, }, mappings: { properties: { date: { type: "date", }, }, }, }); console.log(response);
PUT my-index-000001 { "settings": { "index": { "sort.field": "date", "sort.order": "desc" } }, "mappings": { "properties": { "date": { "type": "date" } } } }
也可以按多个字段对索引进行排序
resp = client.indices.create( index="my-index-000001", settings={ "index": { "sort.field": [ "username", "date" ], "sort.order": [ "asc", "desc" ] } }, mappings={ "properties": { "username": { "type": "keyword", "doc_values": True }, "date": { "type": "date" } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { index: { 'sort.field' => [ 'username', 'date' ], 'sort.order' => [ 'asc', 'desc' ] } }, mappings: { properties: { username: { type: 'keyword', doc_values: true }, date: { type: 'date' } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { index: { "sort.field": ["username", "date"], "sort.order": ["asc", "desc"], }, }, mappings: { properties: { username: { type: "keyword", doc_values: true, }, date: { type: "date", }, }, }, }); console.log(response);
PUT my-index-000001 { "settings": { "index": { "sort.field": [ "username", "date" ], "sort.order": [ "asc", "desc" ] } }, "mappings": { "properties": { "username": { "type": "keyword", "doc_values": true }, "date": { "type": "date" } } } }
索引排序支持以下设置
-
index.sort.field
- 用于对索引进行排序的字段列表。这里只允许使用带有
doc_values
的boolean
、numeric
、date
和keyword
字段。 -
index.sort.order
-
每个字段使用的排序顺序。order 选项可以有以下值
-
asc
: 升序 -
desc
: 降序。
-
-
index.sort.mode
-
Elasticsearch 支持按多值字段进行排序。mode 选项控制选择哪个值来对文档进行排序。mode 选项可以有以下值
-
min
: 选择最小值。 -
max
: 选择最大值。
-
-
index.sort.missing
-
missing 参数指定如何处理缺少该字段的文档。missing 值可以有以下值
-
_last
: 没有该字段值的文档排在最后。 -
_first
: 没有该字段值的文档排在最前。
-
索引排序只能在索引创建时定义一次。不允许在现有索引上添加或更新排序。索引排序在索引吞吐量方面也有成本,因为文档必须在刷新和合并时进行排序。您应该在激活此功能之前测试它对应用程序的影响。
提前终止搜索请求
编辑默认情况下,在 Elasticsearch 中,搜索请求必须访问与查询匹配的每个文档,才能检索按指定排序排序的顶部文档。但是,当索引排序和搜索排序相同时,可以限制每个段中应访问的文档数量,以全局检索排名靠前的 N 个文档。例如,假设我们有一个包含按时间戳字段排序的事件的索引
resp = client.indices.create( index="events", settings={ "index": { "sort.field": "timestamp", "sort.order": "desc" } }, mappings={ "properties": { "timestamp": { "type": "date" } } }, ) print(resp)
response = client.indices.create( index: 'events', body: { settings: { index: { 'sort.field' => 'timestamp', 'sort.order' => 'desc' } }, mappings: { properties: { timestamp: { type: 'date' } } } } ) puts response
const response = await client.indices.create({ index: "events", settings: { index: { "sort.field": "timestamp", "sort.order": "desc", }, }, mappings: { properties: { timestamp: { type: "date", }, }, }, }); console.log(response);
PUT events { "settings": { "index": { "sort.field": "timestamp", "sort.order": "desc" } }, "mappings": { "properties": { "timestamp": { "type": "date" } } } }
您可以使用以下方法搜索最近的 10 个事件
resp = client.search( index="events", size=10, sort=[ { "timestamp": "desc" } ], ) print(resp)
response = client.search( index: 'events', body: { size: 10, sort: [ { timestamp: 'desc' } ] } ) puts response
const response = await client.search({ index: "events", size: 10, sort: [ { timestamp: "desc", }, ], }); console.log(response);
GET /events/_search { "size": 10, "sort": [ { "timestamp": "desc" } ] }
Elasticsearch 将检测到每个段的顶部文档已在索引中排序,并且只会比较每个段的前 N 个文档。收集与查询匹配的其余文档,以计算结果总数并构建聚合。
如果您只查找最近的 10 个事件,并且对与查询匹配的文档总数不感兴趣,可以将 track_total_hits
设置为 false
resp = client.search( index="events", size=10, sort=[ { "timestamp": "desc" } ], track_total_hits=False, ) print(resp)
response = client.search( index: 'events', body: { size: 10, sort: [ { timestamp: 'desc' } ], track_total_hits: false } ) puts response
const response = await client.search({ index: "events", size: 10, sort: [ { timestamp: "desc", }, ], track_total_hits: false, }); console.log(response);
这次,Elasticsearch 将不会尝试计算文档的数量,并且能够在每个段收集到 N 个文档后立即终止查询。
聚合将收集与查询匹配的所有文档,而不管 track_total_hits
的值如何