多路复用器词元过滤器
编辑多路复用器词元过滤器
编辑类型为 multiplexer
的词元过滤器将在同一位置发出多个词元,每个词元版本都通过不同的过滤器运行。同一位置的相同输出词元将被删除。
如果传入的词元流有重复的词元,那么这些词元也会被多路复用器删除。
选项
编辑
filters |
要应用于传入词元的词元过滤器列表。这些可以是索引映射中其他地方定义的任何词元过滤器。可以使用逗号分隔的字符串链接过滤器,例如, |
Shingle 或多词同义词词元过滤器在 filters
数组中声明时将无法正常工作,因为它们在内部提前读取,而这是多路复用器不支持的。
- preserve_original
- 如果为
true
(默认值),则除了过滤后的词元之外,还发出原始词元。
设置示例
编辑你可以这样设置它:
resp = client.indices.create( index="multiplexer_example", settings={ "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "my_multiplexer" ] } }, "filter": { "my_multiplexer": { "type": "multiplexer", "filters": [ "lowercase", "lowercase, porter_stem" ] } } } }, ) print(resp)
response = client.indices.create( index: 'multiplexer_example', body: { settings: { analysis: { analyzer: { my_analyzer: { tokenizer: 'standard', filter: [ 'my_multiplexer' ] } }, filter: { my_multiplexer: { type: 'multiplexer', filters: [ 'lowercase', 'lowercase, porter_stem' ] } } } } } ) puts response
const response = await client.indices.create({ index: "multiplexer_example", settings: { analysis: { analyzer: { my_analyzer: { tokenizer: "standard", filter: ["my_multiplexer"], }, }, filter: { my_multiplexer: { type: "multiplexer", filters: ["lowercase", "lowercase, porter_stem"], }, }, }, }, }); console.log(response);
PUT /multiplexer_example { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "standard", "filter": [ "my_multiplexer" ] } }, "filter": { "my_multiplexer": { "type": "multiplexer", "filters": [ "lowercase", "lowercase, porter_stem" ] } } } } }
并像这样测试它:
resp = client.indices.analyze( index="multiplexer_example", analyzer="my_analyzer", text="Going HOME", ) print(resp)
response = client.indices.analyze( index: 'multiplexer_example', body: { analyzer: 'my_analyzer', text: 'Going HOME' } ) puts response
const response = await client.indices.analyze({ index: "multiplexer_example", analyzer: "my_analyzer", text: "Going HOME", }); console.log(response);
POST /multiplexer_example/_analyze { "analyzer" : "my_analyzer", "text" : "Going HOME" }
它会响应:
{ "tokens": [ { "token": "Going", "start_offset": 0, "end_offset": 5, "type": "<ALPHANUM>", "position": 0 }, { "token": "going", "start_offset": 0, "end_offset": 5, "type": "<ALPHANUM>", "position": 0 }, { "token": "go", "start_offset": 0, "end_offset": 5, "type": "<ALPHANUM>", "position": 0 }, { "token": "HOME", "start_offset": 6, "end_offset": 10, "type": "<ALPHANUM>", "position": 1 }, { "token": "home", "start_offset": 6, "end_offset": 10, "type": "<ALPHANUM>", "position": 1 } ] }
同义词和 synonym_graph 过滤器使用它们之前的分析链来解析和分析它们的同义词列表,如果该链包含在同一位置生成多个词元的词元过滤器,则会抛出异常。如果要在包含多路复用器的词元流上应用同义词,则应将同义词过滤器附加到每个相关的多路复用器过滤器列表,而不是将其放置在主词元链定义中的多路复用器之后。