New

The executive guide to generative AI

Read more

小写分词过滤器

编辑

将分词文本转换为小写。例如,您可以使用 lowercase 过滤器将 THE Lazy DoG 更改为 the lazy dog

除了默认过滤器之外,lowercase 分词过滤器还提供了对 Lucene 针对希腊语、爱尔兰语和土耳其语的特定语言小写过滤器的访问。

示例

编辑

以下 Analyze API 请求使用默认的 lowercase 过滤器将 THE Quick FoX JUMPs 转换为小写

resp = client.indices.analyze(
    tokenizer="standard",
    filter=[
        "lowercase"
    ],
    text="THE Quick FoX JUMPs",
)
print(resp)
response = client.indices.analyze(
  body: {
    tokenizer: 'standard',
    filter: [
      'lowercase'
    ],
    text: 'THE Quick FoX JUMPs'
  }
)
puts response
const response = await client.indices.analyze({
  tokenizer: "standard",
  filter: ["lowercase"],
  text: "THE Quick FoX JUMPs",
});
console.log(response);
GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["lowercase"],
  "text" : "THE Quick FoX JUMPs"
}

该过滤器生成以下分词:

[ the, quick, fox, jumps ]

添加到分析器

编辑

以下 创建索引 API 请求使用 lowercase 过滤器来配置新的 自定义分析器

resp = client.indices.create(
    index="lowercase_example",
    settings={
        "analysis": {
            "analyzer": {
                "whitespace_lowercase": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'lowercase_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          whitespace_lowercase: {
            tokenizer: 'whitespace',
            filter: [
              'lowercase'
            ]
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "lowercase_example",
  settings: {
    analysis: {
      analyzer: {
        whitespace_lowercase: {
          tokenizer: "whitespace",
          filter: ["lowercase"],
        },
      },
    },
  },
});
console.log(response);
PUT lowercase_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "whitespace_lowercase": {
          "tokenizer": "whitespace",
          "filter": [ "lowercase" ]
        }
      }
    }
  }
}

可配置参数

编辑
language

(可选,字符串)要使用的特定于语言的小写分词过滤器。有效值包括:

greek
使用 Lucene 的 GreekLowerCaseFilter
irish
使用 Lucene 的 IrishLowerCaseFilter
turkish
使用 Lucene 的 TurkishLowerCaseFilter

如果未指定,则默认为 Lucene 的 LowerCaseFilter

自定义

编辑

要自定义 lowercase 过滤器,请将其复制以创建新的自定义分词过滤器的基础。您可以使用其可配置参数修改过滤器。

例如,以下请求为希腊语创建自定义的 lowercase 过滤器

resp = client.indices.create(
    index="custom_lowercase_example",
    settings={
        "analysis": {
            "analyzer": {
                "greek_lowercase_example": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "greek_lowercase"
                    ]
                }
            },
            "filter": {
                "greek_lowercase": {
                    "type": "lowercase",
                    "language": "greek"
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'custom_lowercase_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          greek_lowercase_example: {
            type: 'custom',
            tokenizer: 'standard',
            filter: [
              'greek_lowercase'
            ]
          }
        },
        filter: {
          greek_lowercase: {
            type: 'lowercase',
            language: 'greek'
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "custom_lowercase_example",
  settings: {
    analysis: {
      analyzer: {
        greek_lowercase_example: {
          type: "custom",
          tokenizer: "standard",
          filter: ["greek_lowercase"],
        },
      },
      filter: {
        greek_lowercase: {
          type: "lowercase",
          language: "greek",
        },
      },
    },
  },
});
console.log(response);
PUT custom_lowercase_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "greek_lowercase_example": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["greek_lowercase"]
        }
      },
      "filter": {
        "greek_lowercase": {
          "type": "lowercase",
          "language": "greek"
        }
      }
    }
  }
}
Was this helpful?
Feedback