简单分词器
编辑简单分词器
编辑simple
分词器会在任何非字母字符处(例如数字、空格、连字符和撇号)将文本拆分为词元,丢弃非字母字符,并将大写字母转换为小写字母。
示例
编辑resp = client.indices.analyze( analyzer="simple", text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", ) print(resp)
response = client.indices.analyze( body: { analyzer: 'simple', text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ) puts response
const response = await client.indices.analyze({ analyzer: "simple", text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", }); console.log(response);
POST _analyze { "analyzer": "simple", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
simple
分词器会解析句子并生成以下词元
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
自定义
编辑要自定义 simple
分词器,请复制它以创建自定义分词器的基础。 可以根据需要修改此自定义分词器,通常是通过添加词元过滤器。
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_custom_simple_analyzer": { "tokenizer": "lowercase", "filter": [] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_custom_simple_analyzer: { tokenizer: 'lowercase', filter: [] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_custom_simple_analyzer: { tokenizer: "lowercase", filter: [], }, }, }, }, }); console.log(response);