Porter 词干提取 Token 过滤器
编辑Porter 词干提取 Token 过滤器
编辑基于 Porter 词干提取算法,为英语提供算法词干提取。
此过滤器比其他英语词干提取器过滤器(例如kstem
过滤器)更积极地进行词干提取。
porter_stem
过滤器等效于 stemmer
过滤器的 english
变体。
porter_stem
过滤器使用 Lucene 的 PorterStemFilter。
示例
编辑以下 analyze API 请求使用 porter_stem
过滤器将 the foxes jumping quickly
词干提取为 the fox jump quickli
resp = client.indices.analyze( tokenizer="standard", filter=[ "porter_stem" ], text="the foxes jumping quickly", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'porter_stem' ], text: 'the foxes jumping quickly' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: ["porter_stem"], text: "the foxes jumping quickly", }); console.log(response);
GET /_analyze { "tokenizer": "standard", "filter": [ "porter_stem" ], "text": "the foxes jumping quickly" }
该过滤器生成以下 Token:
[ the, fox, jump, quickli ]
添加到分析器
编辑以下创建索引 API 请求使用 porter_stem
过滤器配置新的自定义分析器。
为了正常工作,porter_stem
过滤器需要小写 Token。为了确保 Token 小写,请在分析器配置中,将 lowercase
过滤器添加到 porter_stem
过滤器之前。
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "porter_stem" ] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'whitespace', filter: [ 'lowercase', 'porter_stem' ] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "whitespace", filter: ["lowercase", "porter_stem"], }, }, }, }, }); console.log(response);
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "porter_stem" ] } } } } }