Percolator 字段类型
编辑Percolator 字段类型
编辑percolator
字段类型将 JSON 结构解析为原生查询并存储该查询,以便 percolate 查询 可以使用它来匹配提供的文档。
任何包含 JSON 对象的字段都可以配置为 percolator 字段。percolator 字段类型没有设置。只需配置 percolator
字段类型就足以指示 Elasticsearch 将字段视为查询。
如果以下映射配置了 query
字段的 percolator
字段类型
resp = client.indices.create( index="my-index-000001", mappings={ "properties": { "query": { "type": "percolator" }, "field": { "type": "text" } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { query: { type: 'percolator' }, field: { type: 'text' } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", mappings: { properties: { query: { type: "percolator", }, field: { type: "text", }, }, }, }); console.log(response);
PUT my-index-000001 { "mappings": { "properties": { "query": { "type": "percolator" }, "field": { "type": "text" } } } }
那么您可以索引一个查询
resp = client.index( index="my-index-000001", id="match_value", document={ "query": { "match": { "field": "value" } } }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 'match_value', body: { query: { match: { field: 'value' } } } ) puts response
const response = await client.index({ index: "my-index-000001", id: "match_value", document: { query: { match: { field: "value", }, }, }, }); console.log(response);
PUT my-index-000001/_doc/match_value { "query": { "match": { "field": "value" } } }
重新索引您的 percolator 查询
编辑有时需要重新索引 percolator 查询,才能从新版本中对 percolator
字段类型进行的改进中受益。
可以使用 重新索引 API 来重新索引 percolator 查询。让我们看一下以下具有 percolator 字段类型的索引
resp = client.indices.create( index="index", mappings={ "properties": { "query": { "type": "percolator" }, "body": { "type": "text" } } }, ) print(resp) resp1 = client.indices.update_aliases( actions=[ { "add": { "index": "index", "alias": "queries" } } ], ) print(resp1) resp2 = client.index( index="queries", id="1", refresh=True, document={ "query": { "match": { "body": "quick brown fox" } } }, ) print(resp2)
response = client.indices.create( index: 'index', body: { mappings: { properties: { query: { type: 'percolator' }, body: { type: 'text' } } } } ) puts response response = client.indices.update_aliases( body: { actions: [ { add: { index: 'index', alias: 'queries' } } ] } ) puts response response = client.index( index: 'queries', id: 1, refresh: true, body: { query: { match: { body: 'quick brown fox' } } } ) puts response
const response = await client.indices.create({ index: "index", mappings: { properties: { query: { type: "percolator", }, body: { type: "text", }, }, }, }); console.log(response); const response1 = await client.indices.updateAliases({ actions: [ { add: { index: "index", alias: "queries", }, }, ], }); console.log(response1); const response2 = await client.index({ index: "queries", id: 1, refresh: "true", document: { query: { match: { body: "quick brown fox", }, }, }, }); console.log(response2);
PUT index { "mappings": { "properties": { "query" : { "type" : "percolator" }, "body" : { "type": "text" } } } } POST _aliases { "actions": [ { "add": { "index": "index", "alias": "queries" } } ] } PUT queries/_doc/1?refresh { "query" : { "match" : { "body" : "quick brown fox" } } }
假设您要升级到新的主要版本,为了让新的 Elasticsearch 版本仍然能够读取您的查询,您需要将查询重新索引到当前 Elasticsearch 版本上的新索引中
resp = client.indices.create( index="new_index", mappings={ "properties": { "query": { "type": "percolator" }, "body": { "type": "text" } } }, ) print(resp) resp1 = client.reindex( refresh=True, source={ "index": "index" }, dest={ "index": "new_index" }, ) print(resp1) resp2 = client.indices.update_aliases( actions=[ { "remove": { "index": "index", "alias": "queries" } }, { "add": { "index": "new_index", "alias": "queries" } } ], ) print(resp2)
response = client.indices.create( index: 'new_index', body: { mappings: { properties: { query: { type: 'percolator' }, body: { type: 'text' } } } } ) puts response response = client.reindex( refresh: true, body: { source: { index: 'index' }, dest: { index: 'new_index' } } ) puts response response = client.indices.update_aliases( body: { actions: [ { remove: { index: 'index', alias: 'queries' } }, { add: { index: 'new_index', alias: 'queries' } } ] } ) puts response
const response = await client.indices.create({ index: "new_index", mappings: { properties: { query: { type: "percolator", }, body: { type: "text", }, }, }, }); console.log(response); const response1 = await client.reindex({ refresh: "true", source: { index: "index", }, dest: { index: "new_index", }, }); console.log(response1); const response2 = await client.indices.updateAliases({ actions: [ { remove: { index: "index", alias: "queries", }, }, { add: { index: "new_index", alias: "queries", }, }, ], }); console.log(response2);
PUT new_index { "mappings": { "properties": { "query" : { "type" : "percolator" }, "body" : { "type": "text" } } } } POST /_reindex?refresh { "source": { "index": "index" }, "dest": { "index": "new_index" } } POST _aliases { "actions": [ { "remove": { "index" : "index", "alias": "queries" } }, { "add": { "index": "new_index", "alias": "queries" } } ] }
通过 queries
别名执行 percolate
查询
resp = client.search( index="queries", query={ "percolate": { "field": "query", "document": { "body": "fox jumps over the lazy dog" } } }, ) print(resp)
response = client.search( index: 'queries', body: { query: { percolate: { field: 'query', document: { body: 'fox jumps over the lazy dog' } } } } ) puts response
const response = await client.search({ index: "queries", query: { percolate: { field: "query", document: { body: "fox jumps over the lazy dog", }, }, }, }); console.log(response);
GET /queries/_search { "query": { "percolate" : { "field" : "query", "document" : { "body" : "fox jumps over the lazy dog" } } } }
现在从新索引返回匹配项
{ "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.13076457, "hits": [ { "_index": "new_index", "_id": "1", "_score": 0.13076457, "_source": { "query": { "match": { "body": "quick brown fox" } } }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
优化查询时文本分析
编辑当 percolator 验证 percolator 候选匹配项时,它将解析,执行查询时文本分析,并在被 percolate 的文档上实际运行 percolator 查询。这是针对每个候选匹配项完成的,并且每次执行 percolate
查询时都会完成。如果您的查询时文本分析是查询解析中相对昂贵的部分,那么文本分析可能会成为 percolate 时花费的主要时间因素。当 percolator 最终验证许多候选 percolator 查询匹配项时,这种查询解析开销会变得明显。
为了避免在 percolate 时进行最昂贵的文本分析部分。可以选择在索引 percolator 查询时执行文本分析的昂贵部分。这需要使用两个不同的分析器。第一个分析器实际执行需要执行的文本分析(昂贵部分)。第二个分析器(通常为空格)只是拆分第一个分析器生成的标记。然后在索引 percolator 查询之前,应使用 analyze API 通过更昂贵的分析器分析查询文本。analyze API 的结果(标记)应用于替换 percolator 查询中的原始查询文本。重要的是,现在应将查询配置为覆盖映射中的分析器,而仅使用第二个分析器。大多数基于文本的查询都支持 analyzer
选项(match
,query_string
,simple_query_string
)。使用这种方法,昂贵的文本分析仅执行一次,而不是多次。
让我们通过一个简化的示例演示此工作流程。
假设我们要索引以下 percolator 查询
{ "query" : { "match" : { "body" : { "query" : "missing bicycles" } } } }
使用这些设置和映射
resp = client.indices.create( index="test_index", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "porter_stem" ] } } } }, mappings={ "properties": { "query": { "type": "percolator" }, "body": { "type": "text", "analyzer": "my_analyzer" } } }, ) print(resp)
response = client.indices.create( index: 'test_index', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'standard', filter: [ 'lowercase', 'porter_stem' ] } } } }, mappings: { properties: { query: { type: 'percolator' }, body: { type: 'text', analyzer: 'my_analyzer' } } } } ) puts response
const response = await client.indices.create({ index: "test_index", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "standard", filter: ["lowercase", "porter_stem"], }, }, }, }, mappings: { properties: { query: { type: "percolator", }, body: { type: "text", analyzer: "my_analyzer", }, }, }, }); console.log(response);
PUT /test_index { "settings": { "analysis": { "analyzer": { "my_analyzer" : { "tokenizer": "standard", "filter" : ["lowercase", "porter_stem"] } } } }, "mappings": { "properties": { "query" : { "type": "percolator" }, "body" : { "type": "text", "analyzer": "my_analyzer" } } } }
首先,我们需要使用 analyze API 在索引之前执行文本分析
resp = client.indices.analyze( index="test_index", analyzer="my_analyzer", text="missing bicycles", ) print(resp)
response = client.indices.analyze( index: 'test_index', body: { analyzer: 'my_analyzer', text: 'missing bicycles' } ) puts response
const response = await client.indices.analyze({ index: "test_index", analyzer: "my_analyzer", text: "missing bicycles", }); console.log(response);
POST /test_index/_analyze { "analyzer" : "my_analyzer", "text" : "missing bicycles" }
这将产生以下响应
{ "tokens": [ { "token": "miss", "start_offset": 0, "end_offset": 7, "type": "<ALPHANUM>", "position": 0 }, { "token": "bicycl", "start_offset": 8, "end_offset": 16, "type": "<ALPHANUM>", "position": 1 } ] }
返回顺序中的所有标记都需要替换 percolator 查询中的查询文本
resp = client.index( index="test_index", id="1", refresh=True, document={ "query": { "match": { "body": { "query": "miss bicycl", "analyzer": "whitespace" } } } }, ) print(resp)
response = client.index( index: 'test_index', id: 1, refresh: true, body: { query: { match: { body: { query: 'miss bicycl', analyzer: 'whitespace' } } } } ) puts response
const response = await client.index({ index: "test_index", id: 1, refresh: "true", document: { query: { match: { body: { query: "miss bicycl", analyzer: "whitespace", }, }, }, }, }); console.log(response);
PUT /test_index/_doc/1?refresh { "query" : { "match" : { "body" : { "query" : "miss bicycl", "analyzer" : "whitespace" } } } }
在索引 percolator 流程之前的 analyze API 应该对每个 percolator 查询执行。
在 percolate 时,没有任何变化,可以正常定义 percolate
查询
resp = client.search( index="test_index", query={ "percolate": { "field": "query", "document": { "body": "Bycicles are missing" } } }, ) print(resp)
response = client.search( index: 'test_index', body: { query: { percolate: { field: 'query', document: { body: 'Bycicles are missing' } } } } ) puts response
const response = await client.search({ index: "test_index", query: { percolate: { field: "query", document: { body: "Bycicles are missing", }, }, }, }); console.log(response);
GET /test_index/_search { "query": { "percolate" : { "field" : "query", "document" : { "body" : "Bycicles are missing" } } } }
这将产生如下响应
{ "took": 6, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.13076457, "hits": [ { "_index": "test_index", "_id": "1", "_score": 0.13076457, "_source": { "query": { "match": { "body": { "query": "miss bicycl", "analyzer": "whitespace" } } } }, "fields" : { "_percolator_document_slot" : [0] } } ] } }
优化通配符查询。
编辑通配符查询比 percolator 的其他查询更昂贵,尤其是当通配符表达式很大时。
在具有前缀通配符表达式的 wildcard
查询或仅 prefix
查询的情况下,可以使用 edge_ngram
标记过滤器,在配置了 edge_ngram
标记过滤器的字段上,将这些查询替换为常规 term
查询。
创建具有自定义分析设置的索引
resp = client.indices.create( index="my_queries1", settings={ "analysis": { "analyzer": { "wildcard_prefix": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "wildcard_edge_ngram" ] } }, "filter": { "wildcard_edge_ngram": { "type": "edge_ngram", "min_gram": 1, "max_gram": 32 } } } }, mappings={ "properties": { "query": { "type": "percolator" }, "my_field": { "type": "text", "fields": { "prefix": { "type": "text", "analyzer": "wildcard_prefix", "search_analyzer": "standard" } } } } }, ) print(resp)
response = client.indices.create( index: 'my_queries1', body: { settings: { analysis: { analyzer: { wildcard_prefix: { type: 'custom', tokenizer: 'standard', filter: [ 'lowercase', 'wildcard_edge_ngram' ] } }, filter: { wildcard_edge_ngram: { type: 'edge_ngram', min_gram: 1, max_gram: 32 } } } }, mappings: { properties: { query: { type: 'percolator' }, my_field: { type: 'text', fields: { prefix: { type: 'text', analyzer: 'wildcard_prefix', search_analyzer: 'standard' } } } } } } ) puts response
const response = await client.indices.create({ index: "my_queries1", settings: { analysis: { analyzer: { wildcard_prefix: { type: "custom", tokenizer: "standard", filter: ["lowercase", "wildcard_edge_ngram"], }, }, filter: { wildcard_edge_ngram: { type: "edge_ngram", min_gram: 1, max_gram: 32, }, }, }, }, mappings: { properties: { query: { type: "percolator", }, my_field: { type: "text", fields: { prefix: { type: "text", analyzer: "wildcard_prefix", search_analyzer: "standard", }, }, }, }, }, }); console.log(response);
PUT my_queries1 { "settings": { "analysis": { "analyzer": { "wildcard_prefix": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "wildcard_edge_ngram" ] } }, "filter": { "wildcard_edge_ngram": { "type": "edge_ngram", "min_gram": 1, "max_gram": 32 } } } }, "mappings": { "properties": { "query": { "type": "percolator" }, "my_field": { "type": "text", "fields": { "prefix": { "type": "text", "analyzer": "wildcard_prefix", "search_analyzer": "standard" } } } } } }
该分析器仅在索引时生成前缀标记。 |
|
根据您的前缀搜索需求,增加 |
|
此多字段应用于使用 |
然后,不要索引以下查询
{ "query": { "wildcard": { "my_field": "abc*" } } }
应索引以下查询
resp = client.index( index="my_queries1", id="1", refresh=True, document={ "query": { "term": { "my_field.prefix": "abc" } } }, ) print(resp)
response = client.index( index: 'my_queries1', id: 1, refresh: true, body: { query: { term: { 'my_field.prefix' => 'abc' } } } ) puts response
const response = await client.index({ index: "my_queries1", id: 1, refresh: "true", document: { query: { term: { "my_field.prefix": "abc", }, }, }, }); console.log(response);
PUT /my_queries1/_doc/1?refresh { "query": { "term": { "my_field.prefix": "abc" } } }
这种方式可以比第一个查询更有效地处理第二个查询。
以下搜索请求将与先前索引的 percolator 查询匹配
resp = client.search( index="my_queries1", query={ "percolate": { "field": "query", "document": { "my_field": "abcd" } } }, ) print(resp)
response = client.search( index: 'my_queries1', body: { query: { percolate: { field: 'query', document: { my_field: 'abcd' } } } } ) puts response
const response = await client.search({ index: "my_queries1", query: { percolate: { field: "query", document: { my_field: "abcd", }, }, }, }); console.log(response);
GET /my_queries1/_search { "query": { "percolate": { "field": "query", "document": { "my_field": "abcd" } } } }
{ "took": 6, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": 0.18864399, "hits": [ { "_index": "my_queries1", "_id": "1", "_score": 0.18864399, "_source": { "query": { "term": { "my_field.prefix": "abc" } } }, "fields": { "_percolator_document_slot": [ 0 ] } } ] } }
相同的技术也可用于加速后缀通配符搜索。通过在 edge_ngram
标记过滤器之前使用 reverse
标记过滤器。
resp = client.indices.create( index="my_queries2", settings={ "analysis": { "analyzer": { "wildcard_suffix": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "reverse", "wildcard_edge_ngram" ] }, "wildcard_suffix_search_time": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "reverse" ] } }, "filter": { "wildcard_edge_ngram": { "type": "edge_ngram", "min_gram": 1, "max_gram": 32 } } } }, mappings={ "properties": { "query": { "type": "percolator" }, "my_field": { "type": "text", "fields": { "suffix": { "type": "text", "analyzer": "wildcard_suffix", "search_analyzer": "wildcard_suffix_search_time" } } } } }, ) print(resp)
response = client.indices.create( index: 'my_queries2', body: { settings: { analysis: { analyzer: { wildcard_suffix: { type: 'custom', tokenizer: 'standard', filter: [ 'lowercase', 'reverse', 'wildcard_edge_ngram' ] }, wildcard_suffix_search_time: { type: 'custom', tokenizer: 'standard', filter: [ 'lowercase', 'reverse' ] } }, filter: { wildcard_edge_ngram: { type: 'edge_ngram', min_gram: 1, max_gram: 32 } } } }, mappings: { properties: { query: { type: 'percolator' }, my_field: { type: 'text', fields: { suffix: { type: 'text', analyzer: 'wildcard_suffix', search_analyzer: 'wildcard_suffix_search_time' } } } } } } ) puts response
const response = await client.indices.create({ index: "my_queries2", settings: { analysis: { analyzer: { wildcard_suffix: { type: "custom", tokenizer: "standard", filter: ["lowercase", "reverse", "wildcard_edge_ngram"], }, wildcard_suffix_search_time: { type: "custom", tokenizer: "standard", filter: ["lowercase", "reverse"], }, }, filter: { wildcard_edge_ngram: { type: "edge_ngram", min_gram: 1, max_gram: 32, }, }, }, }, mappings: { properties: { query: { type: "percolator", }, my_field: { type: "text", fields: { suffix: { type: "text", analyzer: "wildcard_suffix", search_analyzer: "wildcard_suffix_search_time", }, }, }, }, }, }); console.log(response);
PUT my_queries2 { "settings": { "analysis": { "analyzer": { "wildcard_suffix": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "reverse", "wildcard_edge_ngram" ] }, "wildcard_suffix_search_time": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "reverse" ] } }, "filter": { "wildcard_edge_ngram": { "type": "edge_ngram", "min_gram": 1, "max_gram": 32 } } } }, "mappings": { "properties": { "query": { "type": "percolator" }, "my_field": { "type": "text", "fields": { "suffix": { "type": "text", "analyzer": "wildcard_suffix", "search_analyzer": "wildcard_suffix_search_time" } } } } } }
然后,不要索引以下查询
{ "query": { "wildcard": { "my_field": "*xyz" } } }
应索引以下查询
resp = client.index( index="my_queries2", id="2", refresh=True, document={ "query": { "match": { "my_field.suffix": "xyz" } } }, ) print(resp)
response = client.index( index: 'my_queries2', id: 2, refresh: true, body: { query: { match: { 'my_field.suffix' => 'xyz' } } } ) puts response
const response = await client.index({ index: "my_queries2", id: 2, refresh: "true", document: { query: { match: { "my_field.suffix": "xyz", }, }, }, }); console.log(response);
以下搜索请求将与先前索引的 percolator 查询匹配
resp = client.search( index="my_queries2", query={ "percolate": { "field": "query", "document": { "my_field": "wxyz" } } }, ) print(resp)
response = client.search( index: 'my_queries2', body: { query: { percolate: { field: 'query', document: { my_field: 'wxyz' } } } } ) puts response
const response = await client.search({ index: "my_queries2", query: { percolate: { field: "query", document: { my_field: "wxyz", }, }, }, }); console.log(response);
GET /my_queries2/_search { "query": { "percolate": { "field": "query", "document": { "my_field": "wxyz" } } } }
专用 Percolator 索引
编辑可以将 Percolate 查询添加到任何索引。除了将 percolate 查询添加到数据所在的索引之外,这些查询还可以添加到专用索引中。这样做的好处是,这个专用的 percolator 索引可以有自己的索引设置(例如,主分片和副本分片的数量)。如果您选择使用专用的 percolate 索引,则需要确保普通索引的映射也可在 percolate 索引上使用。否则,percolate 查询可能会被错误地解析。
强制将未映射的字段作为字符串处理
编辑在某些情况下,无法知道注册了哪种 percolator 查询,如果 percolator 查询引用的字段不存在字段映射,则添加 percolator 查询会失败。这意味着需要更新映射以使字段具有适当的设置,然后可以添加 percolator 查询。但是,有时如果所有未映射的字段都像默认文本字段一样处理就足够了。在这些情况下,可以将 index.percolator.map_unmapped_fields_as_text
设置为 true
(默认为 false
),然后如果 percolator 查询中引用的字段不存在,则会将其作为默认文本字段处理,这样添加 percolator 查询就不会失败。
限制
编辑父/子
编辑由于 percolate
查询一次处理一个文档,因此它不支持对子文档运行的查询和过滤器,例如 has_child
和 has_parent
。
获取查询
编辑在查询解析期间,有许多查询通过 get 调用来获取数据。例如,使用术语查找时的 terms
查询,使用索引脚本时的 template
查询以及使用预索引形状时的 geo_shape
。当这些查询由 percolator
字段类型索引时,get 调用将执行一次。因此,每次 percolator
查询评估这些查询时,都会使用索引时的术语、形状等。重要的是要注意,这些查询所执行的术语获取,在主分片和副本分片上对 percolator 查询进行索引时都会发生,因此如果源索引在索引时发生更改,则实际索引的术语在分片副本之间可能会有所不同。
脚本查询
编辑script
查询内的脚本只能访问 doc values 字段。percolate
查询将提供的文档索引到内存索引中。此内存索引不支持存储字段,因此不会存储 _source
字段和其他存储字段。这就是为什么在 script
查询中 _source
和其他存储字段不可用的原因。
字段别名
编辑包含 字段别名 的 Percolator 查询可能并不总是按预期工作。特别是,如果注册的 percolator 查询包含字段别名,然后该别名在映射中更新为引用其他字段,则存储的查询仍将引用原始目标字段。要获取字段别名的更改,必须显式重新索引 percolator 查询。