› › ›

Elasticsearch 推理服务

编辑

Elasticsearch 推理服务

编辑

创建一个推理端点，以使用 elasticsearch 服务执行推理任务。

如果您通过 elasticsearch 服务使用 ELSER 或 E5 模型，如果模型尚未下载，API 请求将自动下载并部署该模型。

请求

编辑

PUT /_inference/<task_type>/<inference_id>

路径参数

编辑

<inference_id>

（必需，字符串）推理端点的唯一标识符。

<task_type>

（必需，字符串）模型将执行的推理任务的类型。

可用的任务类型

rerank,
sparse_embedding,
text_embedding.

请求正文

编辑

chunking_settings

（可选，对象）分块配置对象。请参阅配置分块以了解有关分块的更多信息。

max_chunking_size: （可选，整数）指定分块中单词的最大大小。默认为 250。此值不能高于 300 或低于 20（对于 sentence 策略）或 10（对于 word 策略）。
overlap: （可选，整数）仅适用于 word 分块策略。指定块的重叠单词数。默认为 100。此值不能高于 max_chunking_size 的一半。
sentence_overlap: （可选，整数）仅适用于 sentence 分块策略。指定块的重叠句子数。它可以是 1 或 0。默认为 1。
strategy: （可选，字符串）指定分块策略。它可以是 sentence 或 word。

service

（必需，字符串）指定任务类型支持的服务类型。在本例中为 elasticsearch。

service_settings

（必需，对象）用于安装推理模型的设置。

这些设置特定于 elasticsearch 服务。

deployment_id

（可选，字符串）现有训练模型部署的 deployment_id。当使用 deployment_id 时，model_id 是可选的。

adaptive_allocations

（可选，对象）自适应分配配置对象。如果启用，模型的分配数量将根据进程当前负载设置。当负载较高时，会自动创建一个新的模型分配（如果设置了，则尊重 max_number_of_allocations 的值）。当负载较低时，会自动删除模型分配（如果设置了，则尊重 min_number_of_allocations 的值）。如果启用了 adaptive_allocations，请勿手动设置分配数量。

enabled: （可选，布尔值）如果为 true，则启用 adaptive_allocations。默认为 false。
max_number_of_allocations: （可选，整数）指定要缩放到的最大分配数量。如果设置，则必须大于或等于 min_number_of_allocations。
min_number_of_allocations: （可选，整数）指定要缩放到的最小分配数量。如果设置，则必须大于或等于 0。如果未定义，则部署缩放到 0。

model_id

（必需，字符串）用于推理任务的模型的名称。它可以是内置模型（例如，E5 的 .multilingual-e5-small）的 ID，也可以是已经通过 Eland 上传的文本嵌入模型。

num_allocations

（必需，整数）此模型在机器学习节点上分配的总数量。增加此值通常会提高吞吐量。如果启用了 adaptive_allocations，请勿设置此值，因为它会自动设置。

num_threads

（必需，整数）设置每个模型分配在推理期间使用的线程数。这通常会提高每个推理请求的速度。推理过程是一个计算密集型过程；threads_per_allocations 不能超过每个节点可用的分配处理器数量。必须是 2 的幂。允许的最大值为 32。

task_settings

（可选，对象）配置推理任务的设置。这些设置特定于您指定的 <task_type>。

rerank 任务类型的 task_settings

return_documents: （可选，布尔值）返回文档而不是仅返回索引。默认为 true。

通过 `elasticsearch` 服务的 ELSER

编辑

以下示例显示如何创建名为 my-elser-model 的推理端点以执行 sparse_embedding 任务类型。

如果 ELSER 模型尚未下载，则以下 API 请求将自动下载 ELSER 模型，然后部署该模型。

const response = await client.inference.put({
  task_type: "sparse_embedding",
  inference_id: "my-elser-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      adaptive_allocations: {
        enabled: true,
        min_number_of_allocations: 1,
        max_number_of_allocations: 4,
      },
      num_threads: 1,
      model_id: ".elser_model_2",
    },
  },
});
console.log(response);

PUT _inference/sparse_embedding/my-elser-model
{
  "service": "elasticsearch",
  "service_settings": {
    "adaptive_allocations": { 
      "enabled": true,
      "min_number_of_allocations": 1,
      "max_number_of_allocations": 4
    },
    "num_threads": 1,
    "model_id": ".elser_model_2" 
  }
}

	自适应分配将被启用，最小分配数为 1，最大分配数为 10。
	`model_id` 必须是内置 ELSER 模型之一的 ID。有效值为 `.elser_model_2` 和 `.elser_model_2_linux-x86_64`。有关详细信息，请参阅 ELSER 模型文档。

通过 `elasticsearch` 服务的 Elastic Rerank

编辑

以下示例显示如何创建一个名为 my-elastic-rerank 的推理端点，以使用内置的 Elastic Rerank 交叉编码器模型执行 rerank 任务类型。

如果 Elastic Rerank 模型尚未下载，则以下 API 请求将自动下载该模型，然后部署该模型。部署后，该模型可用于使用 text_similarity_reranker 检索器进行语义重新排序。

const response = await client.inference.put({
  task_type: "rerank",
  inference_id: "my-elastic-rerank",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      model_id: ".rerank-v1",
      num_threads: 1,
      adaptive_allocations: {
        enabled: true,
        min_number_of_allocations: 1,
        max_number_of_allocations: 4,
      },
    },
  },
});
console.log(response);

PUT _inference/rerank/my-elastic-rerank
{
  "service": "elasticsearch",
  "service_settings": {
    "model_id": ".rerank-v1", 
    "num_threads": 1,
    "adaptive_allocations": { 
      "enabled": true,
      "min_number_of_allocations": 1,
      "max_number_of_allocations": 4
    }
  }
}

	`model_id` 必须是内置 Elastic Rerank 模型的 ID：`.rerank-v1`。
	自适应分配将被启用，最小分配数为 1，最大分配数为 10。

通过 `elasticsearch` 服务的 E5

编辑

以下示例显示如何创建一个名为 my-e5-model 的推理端点以执行 text_embedding 任务类型。

如果 E5 模型尚未下载，则以下 API 请求将自动下载 E5 模型，然后部署该模型。

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="my-e5-model",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "num_allocations": 1,
            "num_threads": 1,
            "model_id": ".multilingual-e5-small"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "my-e5-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      num_allocations: 1,
      num_threads: 1,
      model_id: ".multilingual-e5-small",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".multilingual-e5-small" 
  }
}

model_id 必须是内置 E5 模型之一的 ID。有效值为 .multilingual-e5-small 和 .multilingual-e5-small_linux-x86_64。有关详细信息，请参阅 E5 模型文档。

使用 Kibana 控制台时，您可能会在响应中看到 502 错误网关错误。此错误通常仅反映超时，而模型在后台下载。您可以在机器学习 UI 中查看下载进度。如果使用 Python 客户端，则可以将 timeout 参数设置为更高的值。

通过 `elasticsearch` 服务通过 Eland 上传的模型

编辑

以下示例显示如何创建一个名为 my-msmarco-minilm-model 的推理端点以执行 text_embedding 任务类型。

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="my-msmarco-minilm-model",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "num_allocations": 1,
            "num_threads": 1,
            "model_id": "msmarco-MiniLM-L12-cos-v5"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "my-msmarco-minilm-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      num_allocations: 1,
      num_threads: 1,
      model_id: "msmarco-MiniLM-L12-cos-v5",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/my-msmarco-minilm-model 
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "msmarco-MiniLM-L12-cos-v5" 
  }
}

	为推理端点提供唯一标识符。`inference_id` 必须是唯一的，并且不得与 `model_id` 匹配。
	`model_id` 必须是已通过 Eland 上传的文本嵌入模型的 ID。

通过 `elasticsearch` 服务为 E5 设置自适应分配

编辑

以下示例显示如何创建一个名为 my-e5-model 的推理端点以执行 text_embedding 任务类型并配置自适应分配。

如果 E5 模型尚未下载，则以下 API 请求将自动下载 E5 模型，然后部署该模型。

resp = client.inference.put(
    task_type="text_embedding",
    inference_id="my-e5-model",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "adaptive_allocations": {
                "enabled": True,
                "min_number_of_allocations": 3,
                "max_number_of_allocations": 10
            },
            "num_threads": 1,
            "model_id": ".multilingual-e5-small"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "my-e5-model",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      adaptive_allocations: {
        enabled: true,
        min_number_of_allocations: 3,
        max_number_of_allocations: 10,
      },
      num_threads: 1,
      model_id: ".multilingual-e5-small",
    },
  },
});
console.log(response);

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "adaptive_allocations": {
      "enabled": true,
      "min_number_of_allocations": 3,
      "max_number_of_allocations": 10
    },
    "num_threads": 1,
    "model_id": ".multilingual-e5-small"
  }
}

通过 `elasticsearch` 服务使用现有的模型部署

编辑

以下示例显示如何在创建推理端点时使用已存在的模型部署。

resp = client.inference.put(
    task_type="sparse_embedding",
    inference_id="use_existing_deployment",
    inference_config={
        "service": "elasticsearch",
        "service_settings": {
            "deployment_id": ".elser_model_2"
        }
    },
)
print(resp)

const response = await client.inference.put({
  task_type: "sparse_embedding",
  inference_id: "use_existing_deployment",
  inference_config: {
    service: "elasticsearch",
    service_settings: {
      deployment_id: ".elser_model_2",
    },
  },
});
console.log(response);

PUT _inference/sparse_embedding/use_existing_deployment
{
  "service": "elasticsearch",
  "service_settings": {
    "deployment_id": ".elser_model_2" 
  }
}

已存在的模型部署的 deployment_id。

API 响应包含 model_id 以及模型部署中的线程和分配设置

{
  "inference_id": "use_existing_deployment",
  "task_type": "sparse_embedding",
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 2,
    "num_threads": 1,
    "model_id": ".elser_model_2",
    "deployment_id": ".elser_model_2"
  },
  "chunking_settings": {
    "strategy": "sentence",
    "max_chunk_size": 250,
    "sentence_overlap": 1
  }
}

« Cohere 推理服务 ELSER 推理服务 »

Elasticsearch 推理服务

Elasticsearch 推理服务

请求

路径参数

请求正文

通过 elasticsearch 服务的 ELSER

通过 elasticsearch 服务的 Elastic Rerank

通过 elasticsearch 服务的 E5

通过 elasticsearch 服务通过 Eland 上传的模型

通过 elasticsearch 服务为 E5 设置自适应分配

通过 elasticsearch 服务使用现有的模型部署

通过 `elasticsearch` 服务的 ELSER

通过 `elasticsearch` 服务的 Elastic Rerank

通过 `elasticsearch` 服务的 E5

通过 `elasticsearch` 服务通过 Eland 上传的模型

通过 `elasticsearch` 服务为 E5 设置自适应分配

通过 `elasticsearch` 服务使用现有的模型部署