筛选查询
编辑筛选查询
编辑percolate
查询可用于匹配索引中存储的查询。 percolate
查询本身包含将用作与存储的查询匹配的文档。
示例用法
编辑为了提供一个简单的示例,本文档使用一个索引 my-index-000001
,用于筛选查询和文档。当只有少量筛选查询注册时,此设置可以很好地工作。对于更重的用法,我们建议您将查询和文档存储在单独的索引中。有关更多详细信息,请参阅 幕后工作原理。
创建包含两个字段的索引
resp = client.indices.create( index="my-index-000001", mappings={ "properties": { "message": { "type": "text" }, "query": { "type": "percolator" } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { message: { type: 'text' }, query: { type: 'percolator' } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", mappings: { properties: { message: { type: "text", }, query: { type: "percolator", }, }, }, }); console.log(response);
PUT /my-index-000001 { "mappings": { "properties": { "message": { "type": "text" }, "query": { "type": "percolator" } } } }
message
字段是用于预处理 percolator
查询中定义的文档的字段,然后再将其索引到临时索引中。
query
字段用于索引查询文档。它将保存一个表示实际 Elasticsearch 查询的 json 对象。 query
字段已配置为使用 筛选器字段类型。此字段类型理解查询 dsl 并以这样的方式存储查询,以便以后可以将其用于匹配在 percolate
查询中定义的文档。
在筛选器中注册查询
resp = client.index( index="my-index-000001", id="1", refresh=True, document={ "query": { "match": { "message": "bonsai tree" } } }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 1, refresh: true, body: { query: { match: { message: 'bonsai tree' } } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 1, refresh: "true", document: { query: { match: { message: "bonsai tree", }, }, }, }); console.log(response);
PUT /my-index-000001/_doc/1?refresh { "query": { "match": { "message": "bonsai tree" } } }
将文档与注册的筛选器查询匹配
resp = client.search( index="my-index-000001", query={ "percolate": { "field": "query", "document": { "message": "A new bonsai tree in the office" } } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { percolate: { field: "query", document: { message: "A new bonsai tree in the office", }, }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "percolate": { "field": "query", "document": { "message": "A new bonsai tree in the office" } } } }
上面的请求将产生以下响应
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "my-index-000001", "_id": "1", "_score": 0.26152915, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
参数
编辑在筛选文档时,需要以下参数
|
保存索引查询的 |
|
在指定了多个 |
|
正在筛选的文档的来源。 |
|
类似于 |
|
正在筛选的文档的类型/映射。此参数已弃用,将在 Elasticsearch 8.0 中删除。 |
除了指定正在筛选的文档的来源外,还可以从已存储的文档中检索来源。然后,percolate
查询将在内部执行 get 请求以获取该文档。
在这种情况下,可以使用以下参数替换 document
参数
|
文档所在的索引。这是一个必需的参数。 |
|
要获取的文档的类型。此参数已弃用,将在 Elasticsearch 8.0 中删除。 |
|
要获取的文档的 id。这是一个必需的参数。 |
|
可选地,用于获取要筛选的文档的路由。 |
|
可选地,用于获取要筛选的文档的首选项。 |
|
可选地,要获取的文档的预期版本。 |
在过滤器上下文中筛选
编辑如果您对分数不感兴趣,可以通过将筛选器查询包装在 bool
查询的过滤器子句中或 constant_score
查询中来获得更好的性能
resp = client.search( index="my-index-000001", query={ "constant_score": { "filter": { "percolate": { "field": "query", "document": { "message": "A new bonsai tree in the office" } } } } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { constant_score: { filter: { percolate: { field: "query", document: { message: "A new bonsai tree in the office", }, }, }, }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "constant_score": { "filter": { "percolate": { "field": "query", "document": { "message": "A new bonsai tree in the office" } } } } } }
在索引时,术语从筛选器查询中提取出来,筛选器通常可以通过查看这些提取的术语来确定查询是否匹配。但是,计算分数需要反序列化每个匹配的查询并针对筛选的文档运行它,这是一个更昂贵的操作。因此,如果不需要计算分数,则应将 percolate
查询包装在 constant_score
查询或 bool
查询的过滤器子句中。
请注意,percolate
查询永远不会被查询缓存缓存。
筛选多个文档
编辑percolate
查询可以同时匹配多个文档和索引的筛选器查询。在单个请求中筛选多个文档可以提高性能,因为查询只需要解析和匹配一次,而不是多次。
每个匹配的筛选器查询返回的 _percolator_document_slot
字段在同时筛选多个文档时很重要。它指示哪些文档与特定的筛选器查询匹配。这些数字与 percolate
查询中指定的 documents
数组中的槽相关。
resp = client.search( index="my-index-000001", query={ "percolate": { "field": "query", "documents": [ { "message": "bonsai tree" }, { "message": "new tree" }, { "message": "the office" }, { "message": "office tree" } ] } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { percolate: { field: "query", documents: [ { message: "bonsai tree", }, { message: "new tree", }, { message: "the office", }, { message: "office tree", }, ], }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "percolate": { "field": "query", "documents": [ { "message": "bonsai tree" }, { "message": "new tree" }, { "message": "the office" }, { "message": "office tree" } ] } } }
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.7093853, "hits": [ { "_index": "my-index-000001", "_id": "1", "_score": 0.7093853, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0, 1, 3] } } ] } }
筛选现有文档
编辑为了筛选新索引的文档,可以使用 percolate
查询。根据索引请求的响应,_id
和其他元信息可用于立即筛选新添加的文档。
示例
编辑基于前面的示例。
索引我们要筛选的文档
resp = client.index( index="my-index-000001", id="2", document={ "message": "A new bonsai tree in the office" }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 2, body: { message: 'A new bonsai tree in the office' } ) puts response
const response = await client.index({ index: "my-index-000001", id: 2, document: { message: "A new bonsai tree in the office", }, }); console.log(response);
PUT /my-index-000001/_doc/2 { "message" : "A new bonsai tree in the office" }
索引响应
{ "_index": "my-index-000001", "_id": "2", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "result": "created", "_seq_no" : 1, "_primary_term" : 1 }
筛选现有文档,使用索引响应作为基础来构建新的搜索请求
resp = client.search( index="my-index-000001", query={ "percolate": { "field": "query", "index": "my-index-000001", "id": "2", "version": 1 } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { percolate: { field: "query", index: "my-index-000001", id: "2", version: 1, }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "percolate": { "field": "query", "index": "my-index-000001", "id": "2", "version": 1 } } }
返回的搜索响应与上一个示例中的相同。
筛选查询和突出显示
编辑在突出显示方面,percolate
查询以特殊方式处理。查询命中用于突出显示 percolate
查询中提供的文档。而对于常规突出显示,则使用搜索请求中的查询来突出显示命中。
示例
编辑此示例基于第一个示例的映射。
保存查询
resp = client.index( index="my-index-000001", id="3", refresh=True, document={ "query": { "match": { "message": "brown fox" } } }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 3, refresh: true, body: { query: { match: { message: 'brown fox' } } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 3, refresh: "true", document: { query: { match: { message: "brown fox", }, }, }, }); console.log(response);
PUT /my-index-000001/_doc/3?refresh { "query": { "match": { "message": "brown fox" } } }
保存另一个查询
resp = client.index( index="my-index-000001", id="4", refresh=True, document={ "query": { "match": { "message": "lazy dog" } } }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 4, refresh: true, body: { query: { match: { message: 'lazy dog' } } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 4, refresh: "true", document: { query: { match: { message: "lazy dog", }, }, }, }); console.log(response);
PUT /my-index-000001/_doc/4?refresh { "query": { "match": { "message": "lazy dog" } } }
执行启用 percolate
查询和突出显示的搜索请求
resp = client.search( index="my-index-000001", query={ "percolate": { "field": "query", "document": { "message": "The quick brown fox jumps over the lazy dog" } } }, highlight={ "fields": { "message": {} } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { percolate: { field: "query", document: { message: "The quick brown fox jumps over the lazy dog", }, }, }, highlight: { fields: { message: {}, }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "percolate": { "field": "query", "document": { "message": "The quick brown fox jumps over the lazy dog" } } }, "highlight": { "fields": { "message": {} } } }
这将产生以下响应。
{ "took": 7, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 2, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "my-index-000001", "_id": "3", "_score": 0.26152915, "_source": { "query": { "match": { "message": "brown fox" } } }, "highlight": { "message": [ "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" ] }, "fields" : { "_percolator_document_slot" : [0] } }, { "_index": "my-index-000001", "_id": "4", "_score": 0.26152915, "_source": { "query": { "match": { "message": "lazy dog" } } }, "highlight": { "message": [ "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" ] }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
而不是搜索请求中的查询突出显示筛选器命中,而是筛选器查询突出显示 percolate
查询中定义的文档。
当同时筛选多个文档(如下面的请求)时,突出显示响应是不同的
resp = client.search( index="my-index-000001", query={ "percolate": { "field": "query", "documents": [ { "message": "bonsai tree" }, { "message": "new tree" }, { "message": "the office" }, { "message": "office tree" } ] } }, highlight={ "fields": { "message": {} } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { percolate: { field: "query", documents: [ { message: "bonsai tree", }, { message: "new tree", }, { message: "the office", }, { message: "office tree", }, ], }, }, highlight: { fields: { message: {}, }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "percolate": { "field": "query", "documents": [ { "message": "bonsai tree" }, { "message": "new tree" }, { "message": "the office" }, { "message": "office tree" } ] } }, "highlight": { "fields": { "message": {} } } }
略有不同的响应
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.7093853, "hits": [ { "_index": "my-index-000001", "_id": "1", "_score": 0.7093853, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot" : [0, 1, 3] }, "highlight" : { "0_message" : [ "<em>bonsai</em> <em>tree</em>" ], "3_message" : [ "office <em>tree</em>" ], "1_message" : [ "new <em>tree</em>" ] } } ] } }
筛选器查询中的命名查询
编辑如果存储的筛选器查询是一个复杂查询,并且您想要跟踪其子查询匹配了哪个筛选的文档,那么您可以为其子查询使用 \_name
参数。在这种情况下,在响应中,每个命中以及一个 _percolator_document_slot
字段都包含 _percolator_document_slot_<slotNumber>_matched_queries
字段,这些字段显示了每个筛选的文档匹配了哪些子查询。
例如
resp = client.index( index="my-index-000001", id="5", refresh=True, document={ "query": { "bool": { "should": [ { "match": { "message": { "query": "Japanese art", "_name": "query1" } } }, { "match": { "message": { "query": "Holand culture", "_name": "query2" } } } ] } } }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 5, refresh: true, body: { query: { bool: { should: [ { match: { message: { query: 'Japanese art', _name: 'query1' } } }, { match: { message: { query: 'Holand culture', _name: 'query2' } } } ] } } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 5, refresh: "true", document: { query: { bool: { should: [ { match: { message: { query: "Japanese art", _name: "query1", }, }, }, { match: { message: { query: "Holand culture", _name: "query2", }, }, }, ], }, }, }, }); console.log(response);
PUT /my-index-000001/_doc/5?refresh { "query": { "bool": { "should": [ { "match": { "message": { "query": "Japanese art", "_name": "query1" } } }, { "match": { "message": { "query": "Holand culture", "_name": "query2" } } } ] } } }
resp = client.search( index="my-index-000001", query={ "percolate": { "field": "query", "documents": [ { "message": "Japanse art" }, { "message": "Holand culture" }, { "message": "Japanese art and Holand culture" }, { "message": "no-match" } ] } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { percolate: { field: "query", documents: [ { message: "Japanse art", }, { message: "Holand culture", }, { message: "Japanese art and Holand culture", }, { message: "no-match", }, ], }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "percolate": { "field": "query", "documents": [ { "message": "Japanse art" }, { "message": "Holand culture" }, { "message": "Japanese art and Holand culture" }, { "message": "no-match" } ] } } }
{ "took": 55, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 1.1181908, "hits": [ { "_index": "my-index-000001", "_id": "5", "_score": 1.1181908, "_source": { "query": { "bool": { "should": [ { "match": { "message": { "query": "Japanese art", "_name": "query1" } } }, { "match": { "message": { "query": "Holand culture", "_name": "query2" } } } ] } } }, "fields" : { "_percolator_document_slot" : [0, 1, 2], "_percolator_document_slot_0_matched_queries" : ["query1"], "_percolator_document_slot_1_matched_queries" : ["query2"], "_percolator_document_slot_2_matched_queries" : ["query1", "query2"] } } ] } }
指定多个筛选查询
编辑可以在单个搜索请求中指定多个 percolate
查询
resp = client.search( index="my-index-000001", query={ "bool": { "should": [ { "percolate": { "field": "query", "document": { "message": "bonsai tree" }, "name": "query1" } }, { "percolate": { "field": "query", "document": { "message": "tulip flower" }, "name": "query2" } } ] } }, ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { bool: { should: [ { percolate: { field: "query", document: { message: "bonsai tree", }, name: "query1", }, }, { percolate: { field: "query", document: { message: "tulip flower", }, name: "query2", }, }, ], }, }, }); console.log(response);
GET /my-index-000001/_search { "query": { "bool": { "should": [ { "percolate": { "field": "query", "document": { "message": "bonsai tree" }, "name": "query1" } }, { "percolate": { "field": "query", "document": { "message": "tulip flower" }, "name": "query2" } } ] } } }
_percolator_document_slot
字段名称将以 _name
参数中指定的内容作为后缀。如果未指定,则将使用 field
参数,这将导致歧义。
上面的搜索请求返回类似于此的响应
{ "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.26152915, "hits": [ { "_index": "my-index-000001", "_id": "1", "_score": 0.26152915, "_source": { "query": { "match": { "message": "bonsai tree" } } }, "fields" : { "_percolator_document_slot_query1" : [0] } } ] } }
幕后工作原理
编辑在将文档索引到配置了 筛选器字段类型 映射的索引时,文档的查询部分会被解析为 Lucene 查询并存储到 Lucene 索引中。查询的二进制表示形式会被存储,但查询的术语也会被分析并存储到索引字段中。
在搜索时,请求中指定的文档会被解析为 Lucene 文档,并存储在内存中的临时 Lucene 索引中。此内存中的索引只能保存这一个文档,并且针对该文档进行了优化。在此之后,会基于内存索引中的术语构建一个特殊的查询,该查询基于其索引的查询术语选择候选筛选器查询。然后,内存索引会评估这些查询是否真正匹配。
在执行 percolate
查询期间,选择候选 percolator 查询匹配项是一项重要的性能优化,因为它可以显著减少内存索引需要评估的候选匹配项数量。 percolate
查询可以这样做是因为在索引 percolator 查询期间,查询词会被提取并与 percolator 查询一起索引。不幸的是,percolator 无法从所有查询中提取词(例如 wildcard
或 geo_shape
查询),因此在某些情况下,percolator 无法进行选择优化(例如,如果布尔查询的 required 子句中定义了不支持的查询,或者不支持的查询是 percolator 文档中唯一的查询)。这些查询会被 percolator 标记,可以通过运行以下搜索找到
resp = client.search( query={ "term": { "query.extraction_result": "failed" } }, ) print(resp)
response = client.search( body: { query: { term: { 'query.extraction_result' => 'failed' } } } ) puts response
const response = await client.search({ query: { term: { "query.extraction_result": "failed", }, }, }); console.log(response);
GET /_search { "query": { "term" : { "query.extraction_result" : "failed" } } }
上面的示例假设映射中有一个类型为 percolator
的 query
字段。
鉴于 percolation 的设计,通常使用单独的索引来存储 percolate 查询和被 percolate 的文档是有意义的,而不是像我们在示例中那样使用单个索引。这种方法有几个好处
- 由于 percolate 查询包含与被 percolate 的文档不同的一组字段,因此使用两个单独的索引可以更密集、更有效地存储字段。
- Percolate 查询的扩展方式与其他查询不同,因此 percolation 性能可能会受益于使用不同的索引配置,例如主分片的数量。
备注
编辑允许昂贵的查询
编辑如果 search.allow_expensive_queries
设置为 false,则不会执行 Percolate 查询。