› › ›

经典词元过滤器

编辑

经典词元过滤器

编辑

对 classic 分词器生成的词条执行可选的后处理。

此过滤器从单词末尾删除英语所有格（'s），并从首字母缩写词中删除点号。它使用 Lucene 的 ClassicFilter。

示例

编辑

以下分析 API 请求演示了经典词元过滤器的工作方式。

resp = client.indices.analyze(
    tokenizer="classic",
    filter=[
        "classic"
    ],
    text="The 2 Q.U.I.C.K. Brown-Foxes jumped over the lazy dog's bone.",
)
print(resp)

response = client.indices.analyze(
  body: {
    tokenizer: 'classic',
    filter: [
      'classic'
    ],
    text: "The 2 Q.U.I.C.K. Brown-Foxes jumped over the lazy dog's bone."
  }
)
puts response

const response = await client.indices.analyze({
  tokenizer: "classic",
  filter: ["classic"],
  text: "The 2 Q.U.I.C.K. Brown-Foxes jumped over the lazy dog's bone.",
});
console.log(response);

GET /_analyze
{
  "tokenizer" : "classic",
  "filter" : ["classic"],
  "text" : "The 2 Q.U.I.C.K. Brown-Foxes jumped over the lazy dog's bone."
}

该过滤器生成以下词元

[ The, 2, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, bone ]

添加到分析器

编辑

以下创建索引 API 请求使用经典词元过滤器来配置新的自定义分析器。

resp = client.indices.create(
    index="classic_example",
    settings={
        "analysis": {
            "analyzer": {
                "classic_analyzer": {
                    "tokenizer": "classic",
                    "filter": [
                        "classic"
                    ]
                }
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'classic_example',
  body: {
    settings: {
      analysis: {
        analyzer: {
          classic_analyzer: {
            tokenizer: 'classic',
            filter: [
              'classic'
            ]
          }
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "classic_example",
  settings: {
    analysis: {
      analyzer: {
        classic_analyzer: {
          tokenizer: "classic",
          filter: ["classic"],
        },
      },
    },
  },
});
console.log(response);

PUT /classic_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "classic_analyzer": {
          "tokenizer": "classic",
          "filter": [ "classic" ]
        }
      }
    }
  }
}

« CJK 宽度词元过滤器常用词元过滤器 »