使用 Llama 3 开源模型和 Elastic Search Labs 构建 RAG 系统

Building RAG with Llama 3 open-source and Elastic

本博客将逐步讲解使用两种方法实现 RAG。

Elastic、Llamaindex、本地运行的 Llama 3 (8B) 版本（使用 Ollama）。
Elastic、Langchain、ELSER v2、本地运行的 Llama 3 (8B) 版本（使用 Ollama）。

可在以下 GitHub 位置找到笔记本。

在开始之前，让我们快速了解一下 Llama 3。

Llama 3 概述

Llama 3 是 Meta 最近推出的一个开源大型语言模型。它是 Llama 2 的继任者，根据已发布的指标，它有了显著改进。与最近发布的一些模型（如 Gemma 7B Instruct、Mistral 7B Instruct 等）相比，它的评估指标非常好。该模型有两个版本，分别是 80 亿参数和 700 亿参数版本。值得注意的是，在撰写本博客时，Meta 仍在训练 Llama 3 的 4000 亿+ 参数版本。

Meta Llama 3 Instruct Model Performance. (from https://ai.meta.com/blog/meta-llama-3/)

Meta Llama 3 指令模型性能。（来自 https://ai.meta.com/blog/meta-llama-3/）

上图显示了 Llama 3 在不同数据集上的性能数据，并与其他模型进行了比较。为了优化其在现实场景中的性能，Llama 3 还经过了高质量人工评估集的评估。

多个类别和提示的人工评估汇总结果（来自 https://ai.meta.com/blog/meta-llama-3/）

如何使用 Llama 3 开源模型和 Elastic 构建 RAG 系统

数据集

对于数据集，我们将使用一个虚构的组织策略文档（JSON 格式），可在以下位置获取。

配置 Ollama 和 Llama 3

由于我们使用的是 Llama 3 80 亿参数模型，因此我们将使用 Ollama 运行它。请按照以下步骤安装 Ollama。

浏览以下网址 https://ollama.org.cn/download，根据您的平台下载 Ollama 安装程序。

注意：目前 Windows 版本处于预览阶段。

按照说明安装并在您的操作系统上运行 Ollama。
安装完成后，请按照以下命令下载 Llama 3 模型。

    ollama run llama3

这需要一些时间，具体取决于您的网络带宽。运行完成后，您应该会看到以下界面。

要测试 Llama 3，请从新的终端运行以下命令，或在提示符本身输入文本。

    curl -X POST https://127.0.0.1:11434/api/generate -d '{ "model": "llama3", "prompt":"Why is the sky blue?" }'

提示符下的输出如下所示。

    ❯ ollama run llama3
    >>> Why is the sky blue?
    The color of the sky appears blue to our eyes because of a fascinating combination of scientific factors. Here's the short answer:

    **Scattering of Light**: When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2).
    These molecules scatter the light in all directions, but they do so more efficiently for shorter wavelengths (like blue and violet light) than
    longer wavelengths (like red and orange light).

    **Rayleigh Scattering**: This scattering effect is known as Rayleigh scattering, named after the British physicist Lord Rayleigh, who first
    described it in the late 19th century. It's responsible for the blue color we see in the sky.

    **Atmospheric Composition**: The Earth's atmosphere is composed of approximately 78% nitrogen, 21% oxygen, and small amounts of other gases.
    These gases are more abundant at lower altitudes, where they scatter shorter wavelengths (like blue light) more effectively than longer
    wavelengths (like red light).

    **Sunlight's Wavelengths**: When sunlight enters the Earth's atmosphere, it contains a broad spectrum of wavelengths, including visible light
    with colors like red, orange, yellow, green, blue, indigo, and violet. The shorter wavelengths (blue and violet) are scattered more than the
    longer wavelengths (red and orange), due to Rayleigh scattering.

    **What We See**: As our eyes look up at the sky, we see the combined effect of these factors: the shorter wavelengths (blue light) being
    scattered in all directions by the atmospheric gases, while the longer wavelengths (red and orange light) continue to travel in a more direct
    path to our eyes. This results in the blue color we perceive as the sky.

    So, to summarize: the sky appears blue because of the scattering of sunlight's shorter wavelengths (blue light) by the tiny molecules in the
    Earth's atmosphere, combined with the atmospheric composition and the original wavelengths present in sunlight.

    Now, go enjoy that blue sky!

    >>> Send a message (/? for help)

现在我们已经使用 Ollama 本地运行 Llama 3 了。

Elasticsearch 设置

我们将为此使用 Elastic Cloud 设置。请按照此处的说明进行操作。成功部署后，请记下 API 密钥和 Cloud ID，我们将在设置过程中需要它们。

应用程序设置

有两个笔记本，一个用于使用 Llamaindex 和 Llama 3 实现的 RAG，另一个用于使用 Langchain、ELSER v2 和 Llama 3 实现的 RAG。在第一个笔记本中，我们将 Llama 3 用作本地 LLM 并提供嵌入。在第二个笔记本中，我们将 ELSER v2 用于嵌入，并将 Llama 3 用作本地 LLM。

方法 1：Elastic、Llamaindex、本地运行的 Llama 3 (8B) 版本（使用 Ollama）。

步骤 1：安装所需的依赖项

    !pip install llama-index
    !pip install llama-index-cli
    !pip install llama-index-core
    !pip install llama-index-embeddings-elasticsearch
    !pip install llama-index-embeddings-ollama
    !pip install llama-index-legacy
    !pip install llama-index-llms-ollama
    !pip install llama-index-readers-elasticsearch
    !pip install llama-index-readers-file
    !pip install llama-index-vector-stores-elasticsearch
    !pip install llamaindex-py-client

以上部分安装了所需的 llamaindex 包。

步骤 2：导入所需的依赖项

我们首先导入应用程序所需的包和类。

    from llama_index.core.node_parser import SentenceSplitter
    from llama_index.core.ingestion import IngestionPipeline
    from llama_index.embeddings.ollama import OllamaEmbedding
    from llama_index.vector_stores.elasticsearch import ElasticsearchStore
    from llama_index.core import VectorStoreIndex, QueryBundle
    from llama_index.llms.ollama import Ollama
    from llama_index.core import Document, Settings
    from getpass import getpass
    from urllib.request import urlopen
    import json

我们首先向用户提供一个提示，以捕获 Cloud ID 和 API 密钥值。

    #https://elastic.ac.cn/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
    ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

    #https://elastic.ac.cn/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
    ELASTIC_API_KEY = getpass("Elastic Api Key: ")

如果您不熟悉如何获取 Cloud ID 和 API 密钥，请按照上面代码段中的链接进行操作。

步骤 3：文档处理

我们首先下载 JSON 文档，并使用有效负载构建 Document 对象。

    url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json"
    response = urlopen(url)
    workplace_docs = json.loads(response.read())
    documents = [Document(text=doc['content'],
                              metadata={"name": doc['name'],"summary": doc['summary'],"rolePermissions": doc['rolePermissions']})
                     for doc in workplace_docs]

现在，我们定义 Elasticsearch 向量存储 (ElasticsearchStore)，使用 Llama 3 创建的嵌入以及一个 `pipeline` 来帮助处理上面构建的有效负载并将其导入 Elasticsearch。

摄取管道允许我们使用不同的组件组合管道，其中一个组件允许我们使用 Llama 3 生成嵌入。

    es_vector_store = ElasticsearchStore(index_name="workplace_index",
                                         vector_field='content_vector',
                                         text_field='content',
                                         es_cloud_id=ELASTIC_CLOUD_ID,
                                         es_api_key=ELASTIC_API_KEY)

    # Embedding Model to do local embedding using Ollama.
    ollama_embedding = OllamaEmbedding("llama3")
    # LlamaIndex Pipeline configured to take care of chunking, embedding
    # and storing the embeddings in the vector store.
    pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=512, chunk_overlap=100),
            ollama_embedding
        ], vector_store=es_vector_store
    )

ElasticsearchStore 使用要创建的索引的名称、向量字段和内容字段进行定义。当我们运行管道时，将创建此索引。

创建的索引映射如下所示

使用以下步骤执行管道。此管道运行完成后，索引 `workplace_index` 即可用于查询。请注意，向量字段 `content_vector` 将作为维度为 `4096` 的密集向量创建。维度大小来自 Llama 3 生成的嵌入的大小。

    pipeline.run(show_progress=True,documents=documents)

步骤 4：LLM 配置

现在，我们设置 Llamaindex 以使用 Llama 3 作为 LLM。正如我们之前所述，这是借助 Ollama 完成的。

    Settings.embed_model = ollama_embedding
    local_llm = Ollama(model="llama3")

步骤 5：语义搜索

现在，我们将 Elasticsearch 配置为 Llamaindex 查询引擎的向量存储。然后，查询引擎将用于使用 Elasticsearch 中与上下文相关的數據回答您的问题。

    index = VectorStoreIndex.from_vector_store(es_vector_store)
    query_engine = index.as_query_engine(local_llm, similarity_top_k=10)

    # Customer Query
    query = "What are the organizations sales goals?"
    bundle = QueryBundle(query_str=query,
    embedding=Settings.embed_model.get_query_embedding(query=query))

    response = query_engine.query(bundle)

    print(response.response)

我使用 Llama 3 作为 LLM 并使用 Elasticsearch 作为向量数据库收到的响应如下所示。

    According to the "Fy2024 Company Sales Strategy" document, the organization's primary goal is to:

    * Increase revenue by 20% compared to fiscal year 2023.
    * Expand market share in key segments by 15%.
    * Retain 95% of existing customers and increase customer satisfaction ratings.
    * Launch at least two new products or services in high-demand market segments.

这总结了基于使用 Llama 3 作为本地 LLM 并生成嵌入的 RAG 设置。

现在让我们转向第二种方法，该方法使用 Llama 3 作为本地 LLM，但我们使用 Elastic 的 ELSER v2 来生成嵌入并进行语义搜索。

方法 2：Elastic、Langchain、ELSER v2、本地运行的 Llama 3 (8B) 版本（使用 Ollama）。

步骤 1：安装所需的依赖项

    !pip install langchain
    !pip install langchain-elasticsearch
    !pip install langchain-community
    !pip install tiktoken

以上部分安装了所需的 langchain 包。

步骤 2：导入所需的依赖项

我们首先导入应用程序所需的包和类。此步骤与方法 1 中的步骤 2 类似。

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_elasticsearch import ElasticsearchStore
    from langchain_community.llms import Ollama
    from langchain.prompts import ChatPromptTemplate
    from langchain.schema.output_parser import StrOutputParser
    from langchain.schema.runnable import RunnablePassthrough
    from langchain_elasticsearch import ElasticsearchStore
    from langchain_elasticsearch import SparseVectorStrategy
    from getpass import getpass
    from urllib.request import urlopen
    import json

接下来，向用户提供一个提示，以捕获 Cloud ID 和 API 密钥值。

    #https://elastic.ac.cn/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
    ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

    #https://elastic.ac.cn/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
    ELASTIC_API_KEY = getpass("Elastic Api Key: ")

步骤 3：文档处理

接下来，我们继续下载 JSON 文档并构建有效负载。

    url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/datasets/workplace-documents.json"

    response = urlopen(url)
    workplace_docs = json.loads(response.read())
    metadata = []
    content = []
    for doc in workplace_docs:
        content.append(doc["content"])
        metadata.append(
            {
                "name": doc["name"],
                "summary": doc["summary"],
                "rolePermissions": doc["rolePermissions"],
            }
        )
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=512, chunk_overlap=256
    )
    docs = text_splitter.create_documents(content, metadatas=metadata)

此步骤与方法 1 的方法不同，不同之处在于我们如何使用 LlamaIndex 提供的管道来处理文档。在这里，我们使用 `RecursiveCharacterTextSplitter` 来生成片段。

现在，我们定义 Elasticsearch 向量存储 ElasticsearchStore。

    es_vector_store = ElasticsearchStore(
        es_cloud_id=ELASTIC_CLOUD_ID,
        es_api_key=ELASTIC_API_KEY,
        index_name="workplace_index_elser",
        strategy=SparseVectorStrategy(
            model_id=".elser_model_2_linux-x86_64"
        )
    )

向量存储使用要创建的索引和要用于嵌入和检索的模型进行定义。您可以通过导航到机器学习下的“已训练模型”来检索 `model_id`。

这也会在 Elastic 中创建一个摄取管道，该管道会在将文档摄取到 Elastic 时生成并存储嵌入。

现在，我们添加上面处理的文档。

    es_vector_store.add_documents(documents=docs)

步骤 4：LLM 配置

我们使用以下内容设置要使用的 LLM。这与方法 1 不同，在方法 1 中，我们也使用 Llama 3 进行嵌入。

    llm = Ollama(model="llama3")

步骤 5：语义搜索

现在所有必要的构建块都已就位。我们将它们结合起来，使用 ELSER v2 和 Llama 3 作为 LLM 执行语义搜索。从本质上讲，Elasticsearch ELSER v2 使用其语义搜索功能为用户的问题提供与上下文相关的响应。然后，使用 ELSER 的响应来丰富用户的问题，并使用模板对其进行结构化。然后使用 Llama 3 处理它以生成相关的响应。

    def format_docs(docs):
        return "\n\n".join(doc.page_content for doc in docs)

    retriever = es_vector_store.as_retriever()
    template = """Answer the question based only on the following context:\n

                    {context}
                    
                    Question: {question}
                   """
    prompt = ChatPromptTemplate.from_template(template)
    chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

    chain.invoke("What are the organizations sales goals?")

使用 Llama 3 作为 LLM 并使用 ELSER v2 进行语义搜索的响应如下所示

    According to the provided context, the organization's sales goals for Fiscal Year 2024 are:

    1. Increase revenue by 20% compared to fiscal year 2023.
    2. Expand market share in key segments by 15%.
    3. Retain 95% of existing customers and increase customer satisfaction ratings.

    These goals are outlined under "Objectives for Fiscal Year 2024" in the provided document.

本文总结了基于本地大语言模型 Llama3 和语义搜索 ELSER v2 的 RAG 设置。

结论

在本博文中，我们探讨了使用 Llama3 和 Elastic 的两种 RAG 方法。我们探索了 Llama3 作为大语言模型以及生成嵌入的方式。接下来，我们使用 Llama3 作为本地大语言模型，并使用 ELSER 进行嵌入和语义搜索。我们使用了两个不同的框架：LlamaIndex 和 Langchain。您可以使用这两个框架中的任何一个来实现这两种方法。这些笔记簿已使用 Llama3 8B 参数版本进行了测试。两个笔记簿都可以在此 GitHub 位置找到。

Elasticsearch 与行业领先的生成式 AI 工具和提供商具有原生集成。查看我们关于超越 RAG 基础知识的网络研讨会超越 RAG 基础知识，或关于构建生产就绪型应用程序的网络研讨会 Elastic 向量数据库。

要为您的用例构建最佳搜索解决方案，请开始免费云试用或立即在您的本地机器上试用 Elastic。