Porter 词干提取过滤器
编辑Porter 词干提取过滤器
编辑基于 Porter 词干提取算法,为英语提供 算法词干提取。
与其他英语词干提取过滤器(例如 kstem
过滤器)相比,此过滤器倾向于更积极地进行词干提取。
porter_stem
过滤器等效于 stemmer
过滤器的 english
变体。
porter_stem
过滤器使用 Lucene 的 PorterStemFilter。
示例
编辑以下分析 API 请求使用 porter_stem
过滤器将 the foxes jumping quickly
词干提取为 the fox jump quickli
resp = client.indices.analyze( tokenizer="standard", filter=[ "porter_stem" ], text="the foxes jumping quickly", ) print(resp)
response = client.indices.analyze( body: { tokenizer: 'standard', filter: [ 'porter_stem' ], text: 'the foxes jumping quickly' } ) puts response
const response = await client.indices.analyze({ tokenizer: "standard", filter: ["porter_stem"], text: "the foxes jumping quickly", }); console.log(response);
GET /_analyze { "tokenizer": "standard", "filter": [ "porter_stem" ], "text": "the foxes jumping quickly" }
该过滤器生成以下标记
[ the, fox, jump, quickli ]
添加到分析器
编辑以下 创建索引 API 请求使用 porter_stem
过滤器配置新的 自定义分析器。
为了正常工作,porter_stem
过滤器需要小写标记。为了确保标记是小写的,请在分析器配置中将 lowercase
过滤器添加到 porter_stem
过滤器之前。
resp = client.indices.create( index="my-index-000001", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "porter_stem" ] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'whitespace', filter: [ 'lowercase', 'porter_stem' ] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "whitespace", filter: ["lowercase", "porter_stem"], }, }, }, }, }); console.log(response);
PUT /my-index-000001 { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "whitespace", "filter": [ "lowercase", "porter_stem" ] } } } } }