停用词分析器
编辑停用词分析器
编辑stop
分析器与 simple
分析器 相同,但增加了删除停用词的支持。它默认使用 _english_
停用词。
示例输出
编辑resp = client.indices.analyze( analyzer="stop", text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", ) print(resp)
response = client.indices.analyze( body: { analyzer: 'stop', text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ) puts response
const response = await client.indices.analyze({ analyzer: "stop", text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", }); console.log(response);
POST _analyze { "analyzer": "stop", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
上面的句子会产生以下词项
[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]
配置
编辑stop
分析器接受以下参数
|
预定义的停用词列表,例如 |
|
包含停用词的文件的路径。此路径相对于 Elasticsearch |
有关停用词配置的更多信息,请参阅 停用词标记过滤器。
示例配置
编辑在此示例中,我们将 stop
分析器配置为使用指定的单词列表作为停用词
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_stop_analyzer": { "type": "stop", "stopwords": [ "the", "over" ] } } } }, ) print(resp) resp1 = client.indices.analyze( index="my-index-000001", analyzer="my_stop_analyzer", text="The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", ) print(resp1)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_stop_analyzer: { type: 'stop', stopwords: [ 'the', 'over' ] } } } } } ) puts response response = client.indices.analyze( index: 'my-index-000001', body: { analyzer: 'my_stop_analyzer', text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_stop_analyzer: { type: "stop", stopwords: ["the", "over"], }, }, }, }, }); console.log(response); const response1 = await client.indices.analyze({ index: "my-index-000001", analyzer: "my_stop_analyzer", text: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.", }); console.log(response1);
PUT my-index-000001 { "settings": { "analysis": { "analyzer": { "my_stop_analyzer": { "type": "stop", "stopwords": ["the", "over"] } } } } } POST my-index-000001/_analyze { "analyzer": "my_stop_analyzer", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
上面的示例产生以下词项
[ quick, brown, foxes, jumped, lazy, dog, s, bone ]
定义
编辑它由以下部分组成
如果您需要自定义 stop
分析器超出配置参数的范围,则需要将其重新创建为 custom
分析器并对其进行修改,通常是通过添加标记过滤器。这将重新创建内置的 stop
分析器,您可以将其用作进一步自定义的起点
resp = client.indices.create( index="stop_example", settings={ "analysis": { "filter": { "english_stop": { "type": "stop", "stopwords": "_english_" } }, "analyzer": { "rebuilt_stop": { "tokenizer": "lowercase", "filter": [ "english_stop" ] } } } }, ) print(resp)
response = client.indices.create( index: 'stop_example', body: { settings: { analysis: { filter: { english_stop: { type: 'stop', stopwords: '_english_' } }, analyzer: { rebuilt_stop: { tokenizer: 'lowercase', filter: [ 'english_stop' ] } } } } } ) puts response
const response = await client.indices.create({ index: "stop_example", settings: { analysis: { filter: { english_stop: { type: "stop", stopwords: "_english_", }, }, analyzer: { rebuilt_stop: { tokenizer: "lowercase", filter: ["english_stop"], }, }, }, }, }); console.log(response);