规范化器

规范化器类似于分析器，但它们只能发出单个词元。因此，它们没有分词器，并且只接受可用的字符过滤器和词元过滤器的一个子集。只允许使用基于字符工作的过滤器。例如，允许使用小写过滤器，但不允许使用词干提取过滤器，因为词干提取需要查看整个关键字。当前可以在规范化器定义中使用的过滤器列表是：arabic_normalization、asciifolding、bengali_normalization、cjk_width、decimal_digit、elision、german_normalization、hindi_normalization、indic_normalization、lowercase、pattern_replace、persian_normalization、scandinavian_folding、serbian_normalization、sorani_normalization、trim、uppercase。

Elasticsearch 附带一个内置的 lowercase 规范化器。对于其他形式的规范化，需要自定义配置。

自定义规范化器

编辑

自定义规范化器采用字符过滤器列表和词元过滤器列表。

resp = client.indices.create(
    index="index",
    settings={
        "analysis": {
            "char_filter": {
                "quote": {
                    "type": "mapping",
                    "mappings": [
                        "« => \"",
                        "» => \""
                    ]
                }
            },
            "normalizer": {
                "my_normalizer": {
                    "type": "custom",
                    "char_filter": [
                        "quote"
                    ],
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                }
            }
        }
    },
    mappings={
        "properties": {
            "foo": {
                "type": "keyword",
                "normalizer": "my_normalizer"
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'index',
  body: {
    settings: {
      analysis: {
        char_filter: {
          quote: {
            type: 'mapping',
            mappings: [
              '« => "',
              '» => "'
            ]
          }
        },
        normalizer: {
          my_normalizer: {
            type: 'custom',
            char_filter: [
              'quote'
            ],
            filter: [
              'lowercase',
              'asciifolding'
            ]
          }
        }
      }
    },
    mappings: {
      properties: {
        foo: {
          type: 'keyword',
          normalizer: 'my_normalizer'
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "index",
  settings: {
    analysis: {
      char_filter: {
        quote: {
          type: "mapping",
          mappings: ['« => "', '» => "'],
        },
      },
      normalizer: {
        my_normalizer: {
          type: "custom",
          char_filter: ["quote"],
          filter: ["lowercase", "asciifolding"],
        },
      },
    },
  },
  mappings: {
    properties: {
      foo: {
        type: "keyword",
        normalizer: "my_normalizer",
      },
    },
  },
});
console.log(response);

PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

Copy as curl Try in Elastic

« 模式替换字符过滤器索引模板 »

On this page

自定义规范化器

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

规范化器

规范化器

自定义规范化器

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards