规范化器

编辑

规范化器类似于分析器,但它们只能发出单个词元。因此,它们没有分词器,并且只接受可用的字符过滤器和词元过滤器的一个子集。只允许使用基于字符工作的过滤器。例如,允许使用小写过滤器,但不允许使用词干提取过滤器,因为词干提取需要查看整个关键字。当前可以在规范化器定义中使用的过滤器列表是:arabic_normalizationasciifoldingbengali_normalizationcjk_widthdecimal_digitelisiongerman_normalizationhindi_normalizationindic_normalizationlowercasepattern_replacepersian_normalizationscandinavian_foldingserbian_normalizationsorani_normalizationtrimuppercase

Elasticsearch 附带一个内置的 lowercase 规范化器。对于其他形式的规范化,需要自定义配置。

自定义规范化器

编辑

自定义规范化器采用 字符过滤器列表和 词元过滤器列表。

resp = client.indices.create(
    index="index",
    settings={
        "analysis": {
            "char_filter": {
                "quote": {
                    "type": "mapping",
                    "mappings": [
                        "« => \"",
                        "» => \""
                    ]
                }
            },
            "normalizer": {
                "my_normalizer": {
                    "type": "custom",
                    "char_filter": [
                        "quote"
                    ],
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                }
            }
        }
    },
    mappings={
        "properties": {
            "foo": {
                "type": "keyword",
                "normalizer": "my_normalizer"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'index',
  body: {
    settings: {
      analysis: {
        char_filter: {
          quote: {
            type: 'mapping',
            mappings: [
              '« => "',
              '» => "'
            ]
          }
        },
        normalizer: {
          my_normalizer: {
            type: 'custom',
            char_filter: [
              'quote'
            ],
            filter: [
              'lowercase',
              'asciifolding'
            ]
          }
        }
      }
    },
    mappings: {
      properties: {
        foo: {
          type: 'keyword',
          normalizer: 'my_normalizer'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "index",
  settings: {
    analysis: {
      char_filter: {
        quote: {
          type: "mapping",
          mappings: ['« => "', '» => "'],
        },
      },
      normalizer: {
        my_normalizer: {
          type: "custom",
          char_filter: ["quote"],
          filter: ["lowercase", "asciifolding"],
        },
      },
    },
  },
  mappings: {
    properties: {
      foo: {
        type: "keyword",
        normalizer: "my_normalizer",
      },
    },
  },
});
console.log(response);
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "quote": {
          "type": "mapping",
          "mappings": [
            "« => \"",
            "» => \""
          ]
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": ["quote"],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}