from eland.ml.ltr import QueryFeatureExtractor

feature_extractors=[
    # We want to use the BM25 score of the match query for the title field as a feature:
    QueryFeatureExtractor(
        feature_name="title_bm25",
        query={"match": {"title": "{{query}}"}}
    ),
    # We want to use the the number of matched terms in the title field as a feature:
    QueryFeatureExtractor(
        feature_name="title_matched_term_count",
        query={
            "script_score": {
                "query": {"match": {"title": "{{query}}"}},
                "script": {"source": "return _termStats.matchedTermsCount();"},
            }
        },
    ),
    # We can use a script_score query to get the value
    # of the field rating directly as a feature:
    QueryFeatureExtractor(
        feature_name="popularity",
        query={
            "script_score": {
                "query": {"exists": {"field": "popularity"}},
                "script": {"source": "return doc['popularity'].value;"},
            }
        },
    ),
    # We extract the number of terms in the query as feature.
   QueryFeatureExtractor(
        feature_name="query_term_count",
        query={
            "script_score": {
                "query": {"match": {"title": "{{query}}"}},
                "script": {"source": "return _termStats.uniqueTermsCount();"},
            }
        },
    ),
]

将词项统计信息作为特征

LTR 模型通常会利用原始词项统计信息作为特征。要提取此信息，可以使用词项统计功能，该功能作为 script_score 查询的一部分提供。

一旦定义了特征提取器，它们将被包装在一个 eland.ml.ltr.LTRModelConfig 对象中，以便在后续的训练步骤中使用

from eland.ml.ltr import LTRModelConfig

ltr_config = LTRModelConfig(feature_extractors)

提取用于训练的特征

编辑

构建数据集是训练过程中的关键步骤。这涉及到提取相关特征并将它们添加到您的判断列表中。我们建议使用 Eland 的 eland.ml.ltr.FeatureLogger 辅助类来完成此过程。

from eland.ml.ltr import FeatureLogger

# Create a feature logger that will be used to query {es} to retrieve the features:
feature_logger = FeatureLogger(es_client, MOVIE_INDEX, ltr_config)

FeatureLogger 提供了一个 extract_features 方法，使您能够从判断列表中提取特定文档列表的特征。同时，您可以将查询参数传递给之前定义的特征提取器

feature_logger.extract_features(
    query_params={"query": "foo"},
    doc_ids=["doc-1", "doc-2"]
)

我们的示例笔记本解释了如何使用 FeatureLogger 通过将特征添加到判断列表来构建训练数据集。

关于特征提取的注意事项

编辑

我们强烈建议您不要自行实现特征提取。在训练环境和 Elasticsearch 中的推理之间保持特征提取的一致性至关重要。通过使用与 Elasticsearch 一起开发和测试的 eland 工具，您可以确保它们协同工作一致。
特征提取是通过在 Elasticsearch 服务器上执行查询来完成的。当您的判断列表包含大量示例或者您有很多特征时，这可能会给您的集群带来很大压力。我们的特征日志记录器实现旨在最大限度地减少发送到服务器的搜索请求数量并减少负载。但是，最好使用与任何面向用户的生产流量隔离的 Elasticsearch 集群来构建您的训练数据集。

将您的模型部署到 Elasticsearch 中

编辑

一旦您的模型经过训练，您就可以将其部署到您的 Elasticsearch 集群中。您可以使用 Eland 的 MLModel.import_ltr_model method

from eland.ml import MLModel

LEARNING_TO_RANK_MODEL_ID="ltr-model-xgboost"

MLModel.import_ltr_model(
    es_client=es_client,
    model=ranker,
    model_id=LEARNING_TO_RANK_MODEL_ID,
    ltr_model_config=ltr_config,
    es_if_exists="replace",
)

此方法将以 Elasticsearch 可以理解的格式序列化训练后的模型和 Learning To Rank 配置（包括特征提取）。然后使用创建训练模型 API 将模型部署到 Elasticsearch。

目前，以下类型的模型支持用于 Elasticsearch 的 LTR

未来将支持更多模型类型。

Learning To Rank 模型管理

编辑

一旦您的模型部署在 Elasticsearch 中，您可以使用训练模型 API 来管理它。您现在可以在搜索时使用您的 LTR 模型作为重排序器。

« Learning To Rank 使用 Learning To Rank 进行搜索 »

On this page

使用 Eland 训练和部署模型
在 Eland 中配置特征提取
提取用于训练的特征
关于特征提取的注意事项
将您的模型部署到 Elasticsearch 中
Learning To Rank 模型管理

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

部署和管理 Learning To Rank 模型