›

机器学习

已训练模型

Eland 允许将来自 scikit-learn、XGBoost 和 LightGBM 库的已训练模型序列化，并将其用作 Elasticsearch 中的推理模型。

>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])

>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
    es_client="https://127.0.0.1:9200",
    model_id="xgb-classifier",
    model=xgb_model,
    feature_names=["f0", "f1", "f2", "f3", "f4"],
)

# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

使用 PyTorch 进行自然语言处理 (NLP)

编辑

您需要安装适当版本的 PyTorch 来导入 NLP 模型。运行 python -m pip install 'eland[pytorch]' 来安装该版本。

对于 NLP 任务，Eland 使您能够将 PyTorch 模型导入 Elasticsearch。使用 eland_import_hub_model 脚本下载和安装受支持的 transformer 模型（来自 Hugging Face 模型中心）。例如：

$ eland_import_hub_model <authentication> \ 
  --url https://127.0.0.1:9200/ \ 
  --hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \ 
  --task-type ner \ 
  --start

	使用身份验证方法访问您的集群。请参阅身份验证方法。
	集群 URL。或者，使用 `--cloud-id`。
	指定 Hugging Face 模型中心中模型的标识符。
	指定 NLP 任务的类型。支持的值为 `fill_mask`、`ner`、`question_answering`、`text_classification`、`text_embedding`、`text_expansion`、`text_similarity` 和 `zero_shot_classification`。

有关可用选项的更多信息，请使用 --help 选项运行 eland_import_hub_model。

$ eland_import_hub_model --help

使用 Docker 导入模型

编辑

要使用 Docker 容器，您需要克隆 Eland 仓库：https://github.com/elastic/eland

如果您想在不安装 Eland 的情况下使用它，您可以使用 Docker 镜像

您可以交互式地使用容器

$ docker run -it --rm --network host docker.elastic.co/eland/eland

也可以在非交互式 shell 中运行已安装的脚本，例如：

docker run -it --rm docker.elastic.co/eland/eland \
    eland_import_hub_model \
      --url $ELASTICSEARCH_URL \
      --hub-model-id elastic/distilbert-base-uncased-finetuned-conll03-english \
      --start

将 $ELASTICSEARCH_URL 替换为 Elasticsearch 集群的 URL。出于身份验证目的，请在 URL 中包含管理员用户名和密码，格式如下：https://username:password@host:port。

在隔离环境中安装模型

编辑

您可以通过将 eland_import_hub_model 脚本指向本地文件来在受限或封闭网络中安装模型。

对于 Hugging Face 模型的脱机安装，首先需要在本地克隆该模型，系统需要安装 Git 和 Git 大型文件存储。

从 Hugging Face 中选择您要使用的模型。有关支持的架构的更多信息，请参阅兼容的第三方模型列表。
使用模型 URL 从 Hugging Face 克隆选定的模型。例如：
```
git clone https://hugging-face.cn/dslim/bert-base-NER
```
此命令会在 bert-base-NER 目录中生成模型的本地副本。

使用 eland_import_hub_model 脚本并将 --hub-model-id 设置为克隆模型的目录以安装它

eland_import_hub_model \
      --url 'XXXX' \
      --hub-model-id /PATH/TO/MODEL \
      --task-type ner \
      --es-username elastic --es-password XXX \
      --es-model-id bert-base-ner

如果您使用 Docker 镜像运行 eland_import_hub_model，则必须绑定挂载模型目录，以便容器可以读取文件

docker run --mount type=bind,source=/PATH/TO/MODEL,destination=/model,readonly -it --rm docker.elastic.co/eland/eland \
    eland_import_hub_model \
      --url 'XXXX' \
      --hub-model-id /model \
      --task-type ner \
      --es-username elastic --es-password XXX \
      --es-model-id bert-base-ner

上传到 Elasticsearch 后，模型将具有由 --es-model-id 指定的 ID。如果未设置，则模型 ID 来自 --hub-model-id；空格和路径分隔符将转换为双下划线 __。

通过代理连接到 Elasticsearch

编辑

在幕后，Eland 使用 requests Python 库，该库允许通过环境变量配置代理。例如，要使用 HTTP 代理连接到 HTTPS Elasticsearch 集群，您需要在调用 Eland 时设置 HTTPS_PROXY 环境变量

HTTPS_PROXY=https://proxy-host:proxy-port eland_import_hub_model ...

如果您在 Elasticsearch 集群上禁用了安全性，则应改用 HTTP_PROXY。

身份验证方法

编辑

使用导入脚本时，可以使用以下身份验证选项

Elasticsearch 用户名和密码身份验证（使用 -u 和 -p 选项指定）
```
eland_import_hub_model -u <username> -p <password> --cloud-id <cloud-id> ...
```
当您使用 --url 时，这些 -u 和 -p 选项也适用。

Elasticsearch 用户名和密码身份验证（嵌入在 URL 中）

eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...

Elasticsearch API 密钥身份验证

eland_import_hub_model --es-api-key <api-key> --url https://<hostname>:<port> ...

HuggingFace Hub 访问令牌（用于私有模型）

eland_import_hub_model --hub-access-token <access-token> ...

TLS/SSL

编辑

使用导入脚本时，可以使用以下 Elasticsearch TLS/SSL 选项

指定替代 CA 捆绑包以验证集群证书

eland_import_hub_model --ca-certs CA_CERTS ...

完全禁用 TLS/SSL 验证（强烈不建议）
```
eland_import_hub_model --insecure ...
```

« 数据帧