Elasticsearch 开放式推理 API 添加对 Azure OpenAI 聊天补全的支持

我们在推理 API 中集成了Azure OpenAI 聊天补全功能，这允许我们的客户基于使用大型语言模型（如 GPT-4）的聊天补全构建强大的生成式 AI 应用程序。Azure 和 Elasticsearch 开发人员可以利用 Elasticsearch 向量数据库和 Azure AI 生态系统的独特功能，使用他们选择的模型为独特的生成式 AI 应用程序提供动力。

这篇博客简要介绍了开放式推理 API 中受支持提供商的目录，并通过示例说明了如何使用 Azure 的 OpenAI 聊天补全功能来回答问题。

推理 API 发展迅速！

我们正在大力扩展开放式推理 API 中受支持提供商的目录。查看我们在Elastic Search Labs 上的一些最新博客文章，了解有关嵌入、补全和重新排序的最新集成。

Azure OpenAI 聊天补全支持可通过我们 Elastic Cloud 上的无状态产品中的开放式推理 API 使用。它也将在即将发布的 Elasticsearch 版本中很快提供给所有人。这也补充了在 Azure OpenAI 服务中使用Elasticsearch 向量数据库的功能。

使用 Azure 的 OpenAI 聊天补全功能来回答问题

在我的上一篇关于OpenAI 聊天补全功能的博客文章中，我们学习了如何使用 OpenAI 的聊天补全功能来总结文本。在本指南中，我们将使用 Azure OpenAI 聊天补全功能在摄取过程中回答问题，以便在搜索之前准备好答案。请先通过创建免费 Azure 帐户并设置适合聊天补全的模型，准备好您的 Azure OpenAI api 密钥、部署 ID 和资源名称。您可以按照Azure 的 OpenAI 服务 GPT 快速入门指南来启动并运行模型。在下面的示例中，我们使用了版本为 `2024-02-01` 的 `gpt-4`。您可以此处阅读有关受支持模型和版本的更多信息。

在 Kibana 中，您将可以访问一个控制台，您可以在其中输入 Elasticsearch 中的这些后续步骤，甚至不需要设置 IDE。

首先，我们配置一个将执行补全的模型。

PUT _inference/completion/azure_openai_completion
{
    "service": "azureopenai",
    "service_settings": {
        "resource_name":"<resource-name>",
        "deployment_id": "<deployment-id>",
        "api_version": "2024-02-01",
        "api_key": "<api-key>"
    }
}

成功创建推理后，您将收到类似于以下内容的响应，状态代码为 `200 OK`。

{
    "model_id": "azure_openai_completion",
    "task_type": "completion",
    "service": "azureopenai",
    "service_settings": {
        "resource_name": "<resource-name>",
        "deployment_id": "<deployment-id>",
        "api_version": "2024-02-01"
    },
    "task_settings": {}
}

您现在可以调用已配置的模型来对任何文本输入执行补全。让我们询问模型在生成式 AI 的上下文中什么是推理。

POST _inference/completion/azure_openai_completion
{
    "input": "What is inference in the context of GenAI?"
}

您应该会收到一个状态代码为 `200 OK` 的响应，解释什么是推理。

{
    "completion": [
        {
            "result": "In the context of generative AI, inference refers to the process of generating new data based on the patterns, structures, and relationships the AI has learned from the training data. It involves using a model that has been trained on a lot of data to infer or generate new, similar data. For instance, a generative AI model trained on a collection of paintings might infer or generate new, similar paintings. This is the useful part of machine learning where the actual task is performed."
        }
    ]
}

现在，我们可以设置一个小型问题目录，我们希望在摄取过程中得到解答。我们将使用批量 API来索引三个关于 Elastic 产品的问题。

POST _bulk
{ "index" : { "_index" : "questions" } }
{"question": "What is Elasticsearch?"}
{ "index" : { "_index" : "questions" } }
{"question": "What is Kibana?"}
{ "index" : { "_index" : "questions" } }
{"question": "What is Logstash?"}

成功索引后，您将收到类似于以下内容的响应，状态代码为 `200 OK`。

{
    "errors": false,
    "took": 385,
    "items": [
        {
            "index": {
                "_index": "questions",
                "_id": "4RO6YY8Bv2OsAP2iNusn",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 0,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "questions",
                "_id": "4hO6YY8Bv2OsAP2iNuso",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 1,
                "status": 201
            }
        },
        {
            "index": {
                "_index": "questions",
                "_id": "4xO6YY8Bv2OsAP2iNuso",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 2,
                "_primary_term": 1,
                "status": 201
            }
        }
    ]
}

我们现在将使用摄取管道创建我们的问答脚本-、推理-和删除处理器。

PUT _ingest/pipeline/question_answering_pipeline

{
    "processors": [
        {
            "script": {
                "source": "ctx.prompt = 'Please answer the following question: ' + ctx.question"
                }
        },
        {
            "inference": {
                "model_id": "azure_openai_completion",
                "input_output": {
                    "input_field": "prompt",
                    "output_field": "answer"
                }
            }
        },
        {
            "remove": {
                "field": "prompt"
            }
        }
    ]
}

此管道在一个名为 `prompt` 的临时字段中以“请回答以下问题：”为内容添加前缀。此临时 `prompt` 字段的内容将通过推理 API 发送到 Azure 的 OpenAI 服务以执行补全。使用摄取管道允许极大的灵活性，因为您可以将预提示更改为您想要的任何内容。例如，这也可以让您总结文档。查看Elasticsearch 开放式推理 API 添加对 OpenAI 聊天补全的支持，了解如何构建摘要摄取管道！

我们现在通过调用重新索引 API将包含问题的文档发送到问答管道。

POST _reindex

{
  "source": {
    "index": "questions",
    "size": 50
  },
  "dest": {
    "index": "answers",
    "pipeline": "question_answering_pipeline"
  }
}

您将收到类似于以下内容的状态代码为 `200 OK` 的响应。

{
    "took": 10651,
    "timed_out": false,
    "total": 3,
    "updated": 0,
    "created": 3,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1.0,
    "throttled_until_millis": 0,
    "failures": []
}

在实际环境中，您可能会使用其他摄取机制以自动化方式摄取文档。查看我们的将数据添加到 Elasticsearch 指南，了解 Elastic 提供的各种将数据摄取到 Elasticsearch 的选项。我们还致力于展示摄取机制，并提供有关如何使用第三方工具将数据引入 Elasticsearch 的指导。例如，查看使用 Meltano 将数据从 Snowflake 摄取到 Elasticsearch：开发人员之旅，了解如何使用 Meltano 摄取数据。

您现在可以使用搜索 API搜索您预先生成的答案。

POST answers/_search

{
  "query": {
    "match_all": { }
  }
}

在响应中，您将获得预先生成的答案。

{
    "took": 11,
    "timed_out": false,
    "_shards": { ... },
    "hits": {
        "total": { ... },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "answers",
                "_id": "4RO6YY8Bv2OsAP2iNusn",
                "_score": 1.0,
                "_ignored": [
                    "answer.keyword"
                ],
                "_source": {
                    "model_id": "azure_openai_completion",
                    "question": "What is Elasticsearch?",
                    "answer": "Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. It can handle a wide variety of data types, including textual, numerical, geospatial, structured, and unstructured data. Elasticsearch is scalable and designed to operate in real-time, making it an ideal choice for use cases such as application search, log and event data analysis, and anomaly detection."
                }
            },
            { ... },
            { ... }
        ]
    }
}

为常见问题预先生成答案特别有效地降低了运营成本。通过最大限度地减少对即时响应生成的需要，您可以显著减少所需的计算资源，例如令牌使用量。此外，此方法确保每个用户都收到相同、精确的信息。一致性至关重要，尤其是在需要高度可靠性和准确性的领域，例如医疗、法律或技术支持。