Percolator 字段类型

编辑

percolator 字段类型将 JSON 结构解析为原生查询并存储该查询,以便 percolate 查询 可以使用它来匹配提供的文档。

任何包含 JSON 对象的字段都可以配置为 percolator 字段。percolator 字段类型没有设置。只需配置 percolator 字段类型就足以指示 Elasticsearch 将字段视为查询。

如果以下映射配置了 query 字段的 percolator 字段类型

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "query": {
                "type": "percolator"
            },
            "field": {
                "type": "text"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        query: {
          type: 'percolator'
        },
        field: {
          type: 'text'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      query: {
        type: "percolator",
      },
      field: {
        type: "text",
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "query": {
        "type": "percolator"
      },
      "field": {
        "type": "text"
      }
    }
  }
}

那么您可以索引一个查询

resp = client.index(
    index="my-index-000001",
    id="match_value",
    document={
        "query": {
            "match": {
                "field": "value"
            }
        }
    },
)
print(resp)
response = client.index(
  index: 'my-index-000001',
  id: 'match_value',
  body: {
    query: {
      match: {
        field: 'value'
      }
    }
  }
)
puts response
const response = await client.index({
  index: "my-index-000001",
  id: "match_value",
  document: {
    query: {
      match: {
        field: "value",
      },
    },
  },
});
console.log(response);
PUT my-index-000001/_doc/match_value
{
  "query": {
    "match": {
      "field": "value"
    }
  }
}

percolator 查询中引用的字段必须 已经 存在于与用于 percolate 的索引关联的映射中。为了确保这些字段存在,请通过 创建索引更新映射 API 添加或更新映射。

重新索引您的 percolator 查询

编辑

有时需要重新索引 percolator 查询,才能从新版本中对 percolator 字段类型进行的改进中受益。

可以使用 重新索引 API 来重新索引 percolator 查询。让我们看一下以下具有 percolator 字段类型的索引

resp = client.indices.create(
    index="index",
    mappings={
        "properties": {
            "query": {
                "type": "percolator"
            },
            "body": {
                "type": "text"
            }
        }
    },
)
print(resp)

resp1 = client.indices.update_aliases(
    actions=[
        {
            "add": {
                "index": "index",
                "alias": "queries"
            }
        }
    ],
)
print(resp1)

resp2 = client.index(
    index="queries",
    id="1",
    refresh=True,
    document={
        "query": {
            "match": {
                "body": "quick brown fox"
            }
        }
    },
)
print(resp2)
response = client.indices.create(
  index: 'index',
  body: {
    mappings: {
      properties: {
        query: {
          type: 'percolator'
        },
        body: {
          type: 'text'
        }
      }
    }
  }
)
puts response

response = client.indices.update_aliases(
  body: {
    actions: [
      {
        add: {
          index: 'index',
          alias: 'queries'
        }
      }
    ]
  }
)
puts response

response = client.index(
  index: 'queries',
  id: 1,
  refresh: true,
  body: {
    query: {
      match: {
        body: 'quick brown fox'
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "index",
  mappings: {
    properties: {
      query: {
        type: "percolator",
      },
      body: {
        type: "text",
      },
    },
  },
});
console.log(response);

const response1 = await client.indices.updateAliases({
  actions: [
    {
      add: {
        index: "index",
        alias: "queries",
      },
    },
  ],
});
console.log(response1);

const response2 = await client.index({
  index: "queries",
  id: 1,
  refresh: "true",
  document: {
    query: {
      match: {
        body: "quick brown fox",
      },
    },
  },
});
console.log(response2);
PUT index
{
  "mappings": {
    "properties": {
      "query" : {
        "type" : "percolator"
      },
      "body" : {
        "type": "text"
      }
    }
  }
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index",
        "alias": "queries" 
      }
    }
  ]
}

PUT queries/_doc/1?refresh
{
  "query" : {
    "match" : {
      "body" : "quick brown fox"
    }
  }
}

始终建议为您的索引定义别名,这样在重新索引的情况下,系统/应用程序无需更改就知道 percolator 查询现在位于不同的索引中。

假设您要升级到新的主要版本,为了让新的 Elasticsearch 版本仍然能够读取您的查询,您需要将查询重新索引到当前 Elasticsearch 版本上的新索引中

resp = client.indices.create(
    index="new_index",
    mappings={
        "properties": {
            "query": {
                "type": "percolator"
            },
            "body": {
                "type": "text"
            }
        }
    },
)
print(resp)

resp1 = client.reindex(
    refresh=True,
    source={
        "index": "index"
    },
    dest={
        "index": "new_index"
    },
)
print(resp1)

resp2 = client.indices.update_aliases(
    actions=[
        {
            "remove": {
                "index": "index",
                "alias": "queries"
            }
        },
        {
            "add": {
                "index": "new_index",
                "alias": "queries"
            }
        }
    ],
)
print(resp2)
response = client.indices.create(
  index: 'new_index',
  body: {
    mappings: {
      properties: {
        query: {
          type: 'percolator'
        },
        body: {
          type: 'text'
        }
      }
    }
  }
)
puts response

response = client.reindex(
  refresh: true,
  body: {
    source: {
      index: 'index'
    },
    dest: {
      index: 'new_index'
    }
  }
)
puts response

response = client.indices.update_aliases(
  body: {
    actions: [
      {
        remove: {
          index: 'index',
          alias: 'queries'
        }
      },
      {
        add: {
          index: 'new_index',
          alias: 'queries'
        }
      }
    ]
  }
)
puts response
const response = await client.indices.create({
  index: "new_index",
  mappings: {
    properties: {
      query: {
        type: "percolator",
      },
      body: {
        type: "text",
      },
    },
  },
});
console.log(response);

const response1 = await client.reindex({
  refresh: "true",
  source: {
    index: "index",
  },
  dest: {
    index: "new_index",
  },
});
console.log(response1);

const response2 = await client.indices.updateAliases({
  actions: [
    {
      remove: {
        index: "index",
        alias: "queries",
      },
    },
    {
      add: {
        index: "new_index",
        alias: "queries",
      },
    },
  ],
});
console.log(response2);
PUT new_index
{
  "mappings": {
    "properties": {
      "query" : {
        "type" : "percolator"
      },
      "body" : {
        "type": "text"
      }
    }
  }
}

POST /_reindex?refresh
{
  "source": {
    "index": "index"
  },
  "dest": {
    "index": "new_index"
  }
}

POST _aliases
{
  "actions": [ 
    {
      "remove": {
        "index" : "index",
        "alias": "queries"
      }
    },
    {
      "add": {
        "index": "new_index",
        "alias": "queries"
      }
    }
  ]
}

如果您有别名,请不要忘记将其指向新索引。

通过 queries 别名执行 percolate 查询

resp = client.search(
    index="queries",
    query={
        "percolate": {
            "field": "query",
            "document": {
                "body": "fox jumps over the lazy dog"
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'queries',
  body: {
    query: {
      percolate: {
        field: 'query',
        document: {
          body: 'fox jumps over the lazy dog'
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "queries",
  query: {
    percolate: {
      field: "query",
      document: {
        body: "fox jumps over the lazy dog",
      },
    },
  },
});
console.log(response);
GET /queries/_search
{
  "query": {
    "percolate" : {
      "field" : "query",
      "document" : {
        "body" : "fox jumps over the lazy dog"
      }
    }
  }
}

现在从新索引返回匹配项

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 0.13076457,
    "hits": [
      {
        "_index": "new_index", 
        "_id": "1",
        "_score": 0.13076457,
        "_source": {
          "query": {
            "match": {
              "body": "quick brown fox"
            }
          }
        },
        "fields" : {
          "_percolator_document_slot" : [0]
        }
      }
    ]
  }
}

现在从新索引中呈现 Percolator 查询命中。

优化查询时文本分析

编辑

当 percolator 验证 percolator 候选匹配项时,它将解析,执行查询时文本分析,并在被 percolate 的文档上实际运行 percolator 查询。这是针对每个候选匹配项完成的,并且每次执行 percolate 查询时都会完成。如果您的查询时文本分析是查询解析中相对昂贵的部分,那么文本分析可能会成为 percolate 时花费的主要时间因素。当 percolator 最终验证许多候选 percolator 查询匹配项时,这种查询解析开销会变得明显。

为了避免在 percolate 时进行最昂贵的文本分析部分。可以选择在索引 percolator 查询时执行文本分析的昂贵部分。这需要使用两个不同的分析器。第一个分析器实际执行需要执行的文本分析(昂贵部分)。第二个分析器(通常为空格)只是拆分第一个分析器生成的标记。然后在索引 percolator 查询之前,应使用 analyze API 通过更昂贵的分析器分析查询文本。analyze API 的结果(标记)应用于替换 percolator 查询中的原始查询文本。重要的是,现在应将查询配置为覆盖映射中的分析器,而仅使用第二个分析器。大多数基于文本的查询都支持 analyzer 选项(matchquery_stringsimple_query_string)。使用这种方法,昂贵的文本分析仅执行一次,而不是多次。

让我们通过一个简化的示例演示此工作流程。

假设我们要索引以下 percolator 查询

{
  "query" : {
    "match" : {
      "body" : {
        "query" : "missing bicycles"
      }
    }
  }
}

使用这些设置和映射

resp = client.indices.create(
    index="test_index",
    settings={
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "porter_stem"
                    ]
                }
            }
        }
    },
    mappings={
        "properties": {
            "query": {
                "type": "percolator"
            },
            "body": {
                "type": "text",
                "analyzer": "my_analyzer"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'test_index',
  body: {
    settings: {
      analysis: {
        analyzer: {
          my_analyzer: {
            tokenizer: 'standard',
            filter: [
              'lowercase',
              'porter_stem'
            ]
          }
        }
      }
    },
    mappings: {
      properties: {
        query: {
          type: 'percolator'
        },
        body: {
          type: 'text',
          analyzer: 'my_analyzer'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "test_index",
  settings: {
    analysis: {
      analyzer: {
        my_analyzer: {
          tokenizer: "standard",
          filter: ["lowercase", "porter_stem"],
        },
      },
    },
  },
  mappings: {
    properties: {
      query: {
        type: "percolator",
      },
      body: {
        type: "text",
        analyzer: "my_analyzer",
      },
    },
  },
});
console.log(response);
PUT /test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer" : {
          "tokenizer": "standard",
          "filter" : ["lowercase", "porter_stem"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "query" : {
        "type": "percolator"
      },
      "body" : {
        "type": "text",
        "analyzer": "my_analyzer" 
      }
    }
  }
}

出于本示例的目的,此分析器被认为是昂贵的。

首先,我们需要使用 analyze API 在索引之前执行文本分析

resp = client.indices.analyze(
    index="test_index",
    analyzer="my_analyzer",
    text="missing bicycles",
)
print(resp)
response = client.indices.analyze(
  index: 'test_index',
  body: {
    analyzer: 'my_analyzer',
    text: 'missing bicycles'
  }
)
puts response
const response = await client.indices.analyze({
  index: "test_index",
  analyzer: "my_analyzer",
  text: "missing bicycles",
});
console.log(response);
POST /test_index/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "missing bicycles"
}

这将产生以下响应

{
  "tokens": [
    {
      "token": "miss",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "bicycl",
      "start_offset": 8,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

返回顺序中的所有标记都需要替换 percolator 查询中的查询文本

resp = client.index(
    index="test_index",
    id="1",
    refresh=True,
    document={
        "query": {
            "match": {
                "body": {
                    "query": "miss bicycl",
                    "analyzer": "whitespace"
                }
            }
        }
    },
)
print(resp)
response = client.index(
  index: 'test_index',
  id: 1,
  refresh: true,
  body: {
    query: {
      match: {
        body: {
          query: 'miss bicycl',
          analyzer: 'whitespace'
        }
      }
    }
  }
)
puts response
const response = await client.index({
  index: "test_index",
  id: 1,
  refresh: "true",
  document: {
    query: {
      match: {
        body: {
          query: "miss bicycl",
          analyzer: "whitespace",
        },
      },
    },
  },
});
console.log(response);
PUT /test_index/_doc/1?refresh
{
  "query" : {
    "match" : {
      "body" : {
        "query" : "miss bicycl",
        "analyzer" : "whitespace" 
      }
    }
  }
}

在此处选择空格分析器很重要,否则将使用映射中定义的分析器,这会破坏使用此工作流程的目的。请注意,whitespace 是一个内置分析器,如果需要使用不同的分析器,则需要先在索引的设置中对其进行配置。

在索引 percolator 流程之前的 analyze API 应该对每个 percolator 查询执行。

在 percolate 时,没有任何变化,可以正常定义 percolate 查询

resp = client.search(
    index="test_index",
    query={
        "percolate": {
            "field": "query",
            "document": {
                "body": "Bycicles are missing"
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'test_index',
  body: {
    query: {
      percolate: {
        field: 'query',
        document: {
          body: 'Bycicles are missing'
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "test_index",
  query: {
    percolate: {
      field: "query",
      document: {
        body: "Bycicles are missing",
      },
    },
  },
});
console.log(response);
GET /test_index/_search
{
  "query": {
    "percolate" : {
      "field" : "query",
      "document" : {
        "body" : "Bycicles are missing"
      }
    }
  }
}

这将产生如下响应

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 0.13076457,
    "hits": [
      {
        "_index": "test_index",
        "_id": "1",
        "_score": 0.13076457,
        "_source": {
          "query": {
            "match": {
              "body": {
                "query": "miss bicycl",
                "analyzer": "whitespace"
              }
            }
          }
        },
        "fields" : {
          "_percolator_document_slot" : [0]
        }
      }
    ]
  }
}

优化通配符查询。

编辑

通配符查询比 percolator 的其他查询更昂贵,尤其是当通配符表达式很大时。

在具有前缀通配符表达式的 wildcard 查询或仅 prefix 查询的情况下,可以使用 edge_ngram 标记过滤器,在配置了 edge_ngram 标记过滤器的字段上,将这些查询替换为常规 term 查询。

创建具有自定义分析设置的索引

resp = client.indices.create(
    index="my_queries1",
    settings={
        "analysis": {
            "analyzer": {
                "wildcard_prefix": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "wildcard_edge_ngram"
                    ]
                }
            },
            "filter": {
                "wildcard_edge_ngram": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 32
                }
            }
        }
    },
    mappings={
        "properties": {
            "query": {
                "type": "percolator"
            },
            "my_field": {
                "type": "text",
                "fields": {
                    "prefix": {
                        "type": "text",
                        "analyzer": "wildcard_prefix",
                        "search_analyzer": "standard"
                    }
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my_queries1',
  body: {
    settings: {
      analysis: {
        analyzer: {
          wildcard_prefix: {
            type: 'custom',
            tokenizer: 'standard',
            filter: [
              'lowercase',
              'wildcard_edge_ngram'
            ]
          }
        },
        filter: {
          wildcard_edge_ngram: {
            type: 'edge_ngram',
            min_gram: 1,
            max_gram: 32
          }
        }
      }
    },
    mappings: {
      properties: {
        query: {
          type: 'percolator'
        },
        my_field: {
          type: 'text',
          fields: {
            prefix: {
              type: 'text',
              analyzer: 'wildcard_prefix',
              search_analyzer: 'standard'
            }
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my_queries1",
  settings: {
    analysis: {
      analyzer: {
        wildcard_prefix: {
          type: "custom",
          tokenizer: "standard",
          filter: ["lowercase", "wildcard_edge_ngram"],
        },
      },
      filter: {
        wildcard_edge_ngram: {
          type: "edge_ngram",
          min_gram: 1,
          max_gram: 32,
        },
      },
    },
  },
  mappings: {
    properties: {
      query: {
        type: "percolator",
      },
      my_field: {
        type: "text",
        fields: {
          prefix: {
            type: "text",
            analyzer: "wildcard_prefix",
            search_analyzer: "standard",
          },
        },
      },
    },
  },
});
console.log(response);
PUT my_queries1
{
  "settings": {
    "analysis": {
      "analyzer": {
        "wildcard_prefix": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "wildcard_edge_ngram"
          ]
        }
      },
      "filter": {
        "wildcard_edge_ngram": { 
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 32
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "query": {
        "type": "percolator"
      },
      "my_field": {
        "type": "text",
        "fields": {
          "prefix": { 
            "type": "text",
            "analyzer": "wildcard_prefix",
            "search_analyzer": "standard"
          }
        }
      }
    }
  }
}

该分析器仅在索引时生成前缀标记。

根据您的前缀搜索需求,增加 min_gram 并减少 max_gram 设置。

此多字段应用于使用 termmatch 查询而不是 prefixwildcard 查询进行前缀搜索。

然后,不要索引以下查询

{
  "query": {
    "wildcard": {
      "my_field": "abc*"
    }
  }
}

应索引以下查询

resp = client.index(
    index="my_queries1",
    id="1",
    refresh=True,
    document={
        "query": {
            "term": {
                "my_field.prefix": "abc"
            }
        }
    },
)
print(resp)
response = client.index(
  index: 'my_queries1',
  id: 1,
  refresh: true,
  body: {
    query: {
      term: {
        'my_field.prefix' => 'abc'
      }
    }
  }
)
puts response
const response = await client.index({
  index: "my_queries1",
  id: 1,
  refresh: "true",
  document: {
    query: {
      term: {
        "my_field.prefix": "abc",
      },
    },
  },
});
console.log(response);
PUT /my_queries1/_doc/1?refresh
{
  "query": {
    "term": {
      "my_field.prefix": "abc"
    }
  }
}

这种方式可以比第一个查询更有效地处理第二个查询。

以下搜索请求将与先前索引的 percolator 查询匹配

resp = client.search(
    index="my_queries1",
    query={
        "percolate": {
            "field": "query",
            "document": {
                "my_field": "abcd"
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'my_queries1',
  body: {
    query: {
      percolate: {
        field: 'query',
        document: {
          my_field: 'abcd'
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "my_queries1",
  query: {
    percolate: {
      field: "query",
      document: {
        my_field: "abcd",
      },
    },
  },
});
console.log(response);
GET /my_queries1/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "my_field": "abcd"
      }
    }
  }
}
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total" : {
        "value": 1,
        "relation": "eq"
    },
    "max_score": 0.18864399,
    "hits": [
      {
        "_index": "my_queries1",
        "_id": "1",
        "_score": 0.18864399,
        "_source": {
          "query": {
            "term": {
              "my_field.prefix": "abc"
            }
          }
        },
        "fields": {
          "_percolator_document_slot": [
            0
          ]
        }
      }
    ]
  }
}

相同的技术也可用于加速后缀通配符搜索。通过在 edge_ngram 标记过滤器之前使用 reverse 标记过滤器。

resp = client.indices.create(
    index="my_queries2",
    settings={
        "analysis": {
            "analyzer": {
                "wildcard_suffix": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "reverse",
                        "wildcard_edge_ngram"
                    ]
                },
                "wildcard_suffix_search_time": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "reverse"
                    ]
                }
            },
            "filter": {
                "wildcard_edge_ngram": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 32
                }
            }
        }
    },
    mappings={
        "properties": {
            "query": {
                "type": "percolator"
            },
            "my_field": {
                "type": "text",
                "fields": {
                    "suffix": {
                        "type": "text",
                        "analyzer": "wildcard_suffix",
                        "search_analyzer": "wildcard_suffix_search_time"
                    }
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my_queries2',
  body: {
    settings: {
      analysis: {
        analyzer: {
          wildcard_suffix: {
            type: 'custom',
            tokenizer: 'standard',
            filter: [
              'lowercase',
              'reverse',
              'wildcard_edge_ngram'
            ]
          },
          wildcard_suffix_search_time: {
            type: 'custom',
            tokenizer: 'standard',
            filter: [
              'lowercase',
              'reverse'
            ]
          }
        },
        filter: {
          wildcard_edge_ngram: {
            type: 'edge_ngram',
            min_gram: 1,
            max_gram: 32
          }
        }
      }
    },
    mappings: {
      properties: {
        query: {
          type: 'percolator'
        },
        my_field: {
          type: 'text',
          fields: {
            suffix: {
              type: 'text',
              analyzer: 'wildcard_suffix',
              search_analyzer: 'wildcard_suffix_search_time'
            }
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my_queries2",
  settings: {
    analysis: {
      analyzer: {
        wildcard_suffix: {
          type: "custom",
          tokenizer: "standard",
          filter: ["lowercase", "reverse", "wildcard_edge_ngram"],
        },
        wildcard_suffix_search_time: {
          type: "custom",
          tokenizer: "standard",
          filter: ["lowercase", "reverse"],
        },
      },
      filter: {
        wildcard_edge_ngram: {
          type: "edge_ngram",
          min_gram: 1,
          max_gram: 32,
        },
      },
    },
  },
  mappings: {
    properties: {
      query: {
        type: "percolator",
      },
      my_field: {
        type: "text",
        fields: {
          suffix: {
            type: "text",
            analyzer: "wildcard_suffix",
            search_analyzer: "wildcard_suffix_search_time",
          },
        },
      },
    },
  },
});
console.log(response);
PUT my_queries2
{
  "settings": {
    "analysis": {
      "analyzer": {
        "wildcard_suffix": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "reverse",
            "wildcard_edge_ngram"
          ]
        },
        "wildcard_suffix_search_time": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "reverse"
          ]
        }
      },
      "filter": {
        "wildcard_edge_ngram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 32
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "query": {
        "type": "percolator"
      },
      "my_field": {
        "type": "text",
        "fields": {
          "suffix": {
            "type": "text",
            "analyzer": "wildcard_suffix",
            "search_analyzer": "wildcard_suffix_search_time" 
          }
        }
      }
    }
  }
}

在搜索时也需要自定义分析器,否则查询词不会被反转,并且否则不会与保留的后缀标记匹配。

然后,不要索引以下查询

{
  "query": {
    "wildcard": {
      "my_field": "*xyz"
    }
  }
}

应索引以下查询

resp = client.index(
    index="my_queries2",
    id="2",
    refresh=True,
    document={
        "query": {
            "match": {
                "my_field.suffix": "xyz"
            }
        }
    },
)
print(resp)
response = client.index(
  index: 'my_queries2',
  id: 2,
  refresh: true,
  body: {
    query: {
      match: {
        'my_field.suffix' => 'xyz'
      }
    }
  }
)
puts response
const response = await client.index({
  index: "my_queries2",
  id: 2,
  refresh: "true",
  document: {
    query: {
      match: {
        "my_field.suffix": "xyz",
      },
    },
  },
});
console.log(response);
PUT /my_queries2/_doc/2?refresh
{
  "query": {
    "match": { 
      "my_field.suffix": "xyz"
    }
  }
}

应该使用 match 查询而不是 term 查询,因为文本分析需要反转查询词。

以下搜索请求将与先前索引的 percolator 查询匹配

resp = client.search(
    index="my_queries2",
    query={
        "percolate": {
            "field": "query",
            "document": {
                "my_field": "wxyz"
            }
        }
    },
)
print(resp)
response = client.search(
  index: 'my_queries2',
  body: {
    query: {
      percolate: {
        field: 'query',
        document: {
          my_field: 'wxyz'
        }
      }
    }
  }
)
puts response
const response = await client.search({
  index: "my_queries2",
  query: {
    percolate: {
      field: "query",
      document: {
        my_field: "wxyz",
      },
    },
  },
});
console.log(response);
GET /my_queries2/_search
{
  "query": {
    "percolate": {
      "field": "query",
      "document": {
        "my_field": "wxyz"
      }
    }
  }
}

专用 Percolator 索引

编辑

可以将 Percolate 查询添加到任何索引。除了将 percolate 查询添加到数据所在的索引之外,这些查询还可以添加到专用索引中。这样做的好处是,这个专用的 percolator 索引可以有自己的索引设置(例如,主分片和副本分片的数量)。如果您选择使用专用的 percolate 索引,则需要确保普通索引的映射也可在 percolate 索引上使用。否则,percolate 查询可能会被错误地解析。

强制将未映射的字段作为字符串处理

编辑

在某些情况下,无法知道注册了哪种 percolator 查询,如果 percolator 查询引用的字段不存在字段映射,则添加 percolator 查询会失败。这意味着需要更新映射以使字段具有适当的设置,然后可以添加 percolator 查询。但是,有时如果所有未映射的字段都像默认文本字段一样处理就足够了。在这些情况下,可以将 index.percolator.map_unmapped_fields_as_text 设置为 true (默认为 false),然后如果 percolator 查询中引用的字段不存在,则会将其作为默认文本字段处理,这样添加 percolator 查询就不会失败。

限制

编辑
父/子
编辑

由于 percolate 查询一次处理一个文档,因此它不支持对子文档运行的查询和过滤器,例如 has_childhas_parent

获取查询
编辑

在查询解析期间,有许多查询通过 get 调用来获取数据。例如,使用术语查找时的 terms 查询,使用索引脚本时的 template 查询以及使用预索引形状时的 geo_shape。当这些查询由 percolator 字段类型索引时,get 调用将执行一次。因此,每次 percolator 查询评估这些查询时,都会使用索引时的术语、形状等。重要的是要注意,这些查询所执行的术语获取,在主分片和副本分片上对 percolator 查询进行索引时都会发生,因此如果源索引在索引时发生更改,则实际索引的术语在分片副本之间可能会有所不同。

脚本查询
编辑

script 查询内的脚本只能访问 doc values 字段。percolate 查询将提供的文档索引到内存索引中。此内存索引不支持存储字段,因此不会存储 _source 字段和其他存储字段。这就是为什么在 script 查询中 _source 和其他存储字段不可用的原因。

字段别名
编辑

包含 字段别名 的 Percolator 查询可能并不总是按预期工作。特别是,如果注册的 percolator 查询包含字段别名,然后该别名在映射中更新为引用其他字段,则存储的查询仍将引用原始目标字段。要获取字段别名的更改,必须显式重新索引 percolator 查询。