max_chunking_size: （可选，整数）指定分块中单词的最大大小。默认为 250。此值不能高于 300 或低于 20（对于 sentence 策略）或 10（对于 word 策略）。
overlap: （可选，整数）仅适用于 word 分块策略。指定块的重叠单词数。默认为 100。此值不能高于 max_chunking_size 的一半。
sentence_overlap: （可选，整数）仅适用于 sentence 分块策略。指定块的重叠句子数。它可以是 1 或 0。默认为 1。
strategy: （可选，字符串）指定分块策略。它可以是 sentence 或 word。

service

（必需，字符串）指定任务类型支持的服务类型。在本例中为 elasticsearch。

service_settings

（必需，对象）用于安装推理模型的设置。

这些设置特定于 elasticsearch 服务。

deployment_id

（可选，字符串）现有训练模型部署的 deployment_id。当使用 deployment_id 时，model_id 是可选的。

adaptive_allocations

（可选，对象）自适应分配配置对象。如果启用，模型的分配数量将根据进程当前负载设置。当负载较高时，会自动创建一个新的模型分配（如果设置了，则尊重 max_number_of_allocations 的值）。当负载较低时，会自动删除模型分配（如果设置了，则尊重 min_number_of_allocations 的值）。如果启用了 adaptive_allocations，请勿手动设置分配数量。

enabled: （可选，布尔值）如果为 true，则启用 adaptive_allocations。默认为 false。
max_number_of_allocations: （可选，整数）指定要缩放到的最大分配数量。如果设置，则必须大于或等于 min_number_of_allocations。
min_number_of_allocations: （可选，整数）指定要缩放到的最小分配数量。如果设置，则必须大于或等于 0。如果未定义，则部署缩放到 0。

model_id

（必需，字符串）用于推理任务的模型的名称。它可以是内置模型（例如，E5 的 .multilingual-e5-small）的 ID，也可以是已经通过 Eland 上传的文本嵌入模型。

num_allocations

（必需，整数）此模型在机器学习节点上分配的总数量。增加此值通常会提高吞吐量。如果启用了 adaptive_allocations，请勿设置此值，因为它会自动设置。

num_threads

（必需，整数）设置每个模型分配在推理期间使用的线程数。这通常会提高每个推理请求的速度。推理过程是一个计算密集型过程；threads_per_allocations 不能超过每个节点可用的分配处理器数量。必须是 2 的幂。允许的最大值为 32。

task_settings

（可选，对象）配置推理任务的设置。这些设置特定于您指定的 <task_type>。

rerank 任务类型的 task_settings

return_documents: （可选，布尔值）返回文档而不是仅返回索引。默认为 true。

通过 `elasticsearch` 服务的 ELSER

编辑

以下示例显示如何创建名为 my-elser-model 的推理端点以执行 sparse_embedding 任务类型。

如果 ELSER 模型尚未下载，则以下 API 请求将自动下载 ELSER 模型，然后部署该模型。

const response = await client.inference.put({
  task_type: "sparse_embedding",
  inference_id: "my-elser-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      adaptive_allocations: {
        enabled: true,
        min_number_of_allocations: 1,
        max_number_of_allocations: 4,
      },
      num_threads: 1,
      model_id: ".elser_model_2",
    },
  },
});
console.log(response);

PUT _inference/sparse_embedding/my-elser-model
{
  "service": "elasticsearch",
  "service_settings": {
    "adaptive_allocations": { 
      "enabled": true,
      "min_number_of_allocations": 1,
      "max_number_of_allocations": 4
    },
    "num_threads": 1,
    "model_id": ".elser_model_2" 
  }
}

Copy as curl Try in Elastic

	自适应分配将被启用，最小分配数为 1，最大分配数为 10。
	`model_id` 必须是内置 ELSER 模型之一的 ID。有效值为 `.elser_model_2` 和 `.elser_model_2_linux-x86_64`。有关详细信息，请参阅 ELSER 模型文档。

通过 `elasticsearch` 服务的 Elastic Rerank

编辑

以下示例显示如何创建一个名为 my-elastic-rerank 的推理端点，以使用内置的 Elastic Rerank 交叉编码器模型执行 rerank 任务类型。

如果 Elastic Rerank 模型尚未下载，则以下 API 请求将自动下载该模型，然后部署该模型。部署后，该模型可用于使用 text_similarity_reranker 检索器进行语义重新排序。

const response = await client.inference.put({
  task_type: "rerank",
  inference_id: "my-elastic-rerank",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      model_id: ".rerank-v1",
      num_threads: 1,
      adaptive_allocations: {
        enabled: true,
        min_number_of_allocations: 1,
        max_number_of_allocations: 4,
      },
    },
  },
});
console.log(response);

PUT _inference/rerank/my-elastic-rerank
{
  "service": "elasticsearch",
  "service_settings": {
    "model_id": ".rerank-v1", 
    "num_threads": 1,
    "adaptive_allocations": { 
      "enabled": true,
      "min_number_of_allocations": 1,
      "max_number_of_allocations": 4
    }
  }
}

Copy as curl Try in Elastic

	`model_id` 必须是内置 Elastic Rerank 模型的 ID：`.rerank-v1`。
	自适应分配将被启用，最小分配数为 1，最大分配数为 10。

通过 `elasticsearch` 服务的 E5

编辑

以下示例显示如何创建一个名为 my-e5-model 的推理端点以执行 text_embedding 任务类型。

如果 E5 模型尚未下载，则以下 API 请求将自动下载 E5 模型，然后部署该模型。

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="my-e5-model",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "num_allocations": 1,
            "num_threads": 1,
            "model_id": ".multilingual-e5-small"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "my-e5-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      num_allocations: 1,
      num_threads: 1,
      model_id: ".multilingual-e5-small",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".multilingual-e5-small" 
  }
}

Copy as curl Try in Elastic

model_id 必须是内置 E5 模型之一的 ID。有效值为 .multilingual-e5-small 和 .multilingual-e5-small_linux-x86_64。有关详细信息，请参阅 E5 模型文档。

使用 Kibana 控制台时，您可能会在响应中看到 502 错误网关错误。此错误通常仅反映超时，而模型在后台下载。您可以在机器学习 UI 中查看下载进度。如果使用 Python 客户端，则可以将 timeout 参数设置为更高的值。

通过 `elasticsearch` 服务通过 Eland 上传的模型

编辑

以下示例显示如何创建一个名为 my-msmarco-minilm-model 的推理端点以执行 text_embedding 任务类型。

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="my-msmarco-minilm-model",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "num_allocations": 1,
            "num_threads": 1,
            "model_id": "msmarco-MiniLM-L12-cos-v5"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "my-msmarco-minilm-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      num_allocations: 1,
      num_threads: 1,
      model_id: "msmarco-MiniLM-L12-cos-v5",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/my-msmarco-minilm-model 
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "msmarco-MiniLM-L12-cos-v5" 
  }
}

Copy as curl Try in Elastic

	为推理端点提供唯一标识符。`inference_id` 必须是唯一的，并且不得与 `model_id` 匹配。
	`model_id` 必须是已通过 Eland 上传的文本嵌入模型的 ID。

通过 `elasticsearch` 服务为 E5 设置自适应分配

编辑

以下示例显示如何创建一个名为 my-e5-model 的推理端点以执行 text_embedding 任务类型并配置自适应分配。

如果 E5 模型尚未下载，则以下 API 请求将自动下载 E5 模型，然后部署该模型。

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="my-e5-model",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "adaptive_allocations": {
                "enabled": True,
                "min_number_of_allocations": 3,
                "max_number_of_allocations": 10
            },
            "num_threads": 1,
            "model_id": ".multilingual-e5-small"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "my-e5-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      adaptive_allocations: {
        enabled: true,
        min_number_of_allocations: 3,
        max_number_of_allocations: 10,
      },
      num_threads: 1,
      model_id: ".multilingual-e5-small",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "adaptive_allocations": {
      "enabled": true,
      "min_number_of_allocations": 3,
      "max_number_of_allocations": 10
    },
    "num_threads": 1,
    "model_id": ".multilingual-e5-small"
  }
}

Copy as curl Try in Elastic

通过 `elasticsearch` 服务使用现有的模型部署

编辑

以下示例显示如何在创建推理端点时使用已存在的模型部署。

resp = client.inference.put(
    task_type="sparse_embedding",
    inference_id="use_existing_deployment",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "deployment_id": ".elser_model_2"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "sparse_embedding",
  inference_id: "use_existing_deployment",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      deployment_id: ".elser_model_2",
    },
  },
});
console.log(response);

PUT _inference/sparse_embedding/use_existing_deployment
{
  "service": "elasticsearch",
  "service_settings": {
    "deployment_id": ".elser_model_2" 
  }
}

Copy as curl Try in Elastic

已存在的模型部署的 deployment_id。

API 响应包含 model_id 以及模型部署中的线程和分配设置

{
  "inference_id": "use_existing_deployment",
  "task_type": "sparse_embedding",
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 2,
    "num_threads": 1,
    "model_id": ".elser_model_2",
    "deployment_id": ".elser_model_2"
  },
  "chunking_settings": {
    "strategy": "sentence",
    "max_chunk_size": 250,
    "sentence_overlap": 1
  }
}

« Cohere 推理服务 ELSER 推理服务 »

On this page

请求
路径参数
请求正文
通过 elasticsearch 服务的 ELSER
通过 elasticsearch 服务的 Elastic Rerank
通过 elasticsearch 服务的 E5
通过 elasticsearch 服务通过 Eland 上传的模型
通过 elasticsearch 服务为 E5 设置自适应分配
通过 elasticsearch 服务使用现有的模型部署

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

Elasticsearch 推理服务