Build RAG quickly with minimal code in Elastic 8.15

Learn how to build an end-to-end RAG pipeline with the S3 Connector, semantic_text datatype, and Elastic Playground.

Elastic 8.15 is out, and Semantic Search is easier than ever to pull off.

We're going to cover how to accomplish all of these tasks in 15 minutes:

  1. Store your documents in some data storage service like an AWS S3 Bucket
  2. Set up an Elastic S3 Connector
  3. Upload an embedding model using the eland library, set-up an inference API in Elastic
  4. Connect that to an index that uses the semantic_text datatype
  5. Add your inference API to that index
  6. Configure and sync content with the S3 Connector
  7. Use the Elastic Playground immediately

You will need:

  1. An Elastic Cloud Deployment updated to Elastic 8.15
  2. An S3 bucket
  3. An LLM API service (Anthropic, Azure, OpenAI, Gemini)

And that's it! Let's get this done.

Collecting data

To follow along with this specific demo, I've uploaded a zip file containing the data used here. It's the first 60 or so pages of the Silmarillion, each as a separate pdf file. I'm going through a Lord of the Rings kick at the moment. Feel free to download it and upload it to your S3 bucket!

Splitting the document into individual pages is sometimes necessary for large documents, as the native Elastic S3 Connector will not ingest content from files over 10MB in size.

I use this Python script for splitting a PDF into individual pages:

import os
from PyPDF2 import PdfReader, PdfWriter

def split_pdf(input_pdf_path, output_folder, filename=''):
    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Open the PDF file
    with open(input_pdf_path, 'rb') as file:
        pdf_reader = PdfReader(file)
        
        # Iterate through each page
        for page_num in range(len(pdf_reader.pages)):
            pdf_writer = PdfWriter()
            pdf_writer.add_page(pdf_reader.pages[page_num])
            
            # Generate the output file name
            output_filename = f'{filename}pg_{page_num + 1}.pdf'
            output_path = os.path.join(output_folder, output_filename)
            
            # Save the page as a new PDF
            with open(output_path, 'wb') as output_file:
                pdf_writer.write(output_file)
            
            print(f'Saved {output_filename}')

# Example usage
input_pdf = 'The Silmarillion (Illustrated) - J. R. R. Tolkien; Ted Nasmith;.pdf'
output_folder = './silmarillion_pages/'

split_pdf(input_pdf, output_folder, "Silmarillion_")

Setting up the S3 connector

The connector can ingest a huge variety of data types. Here, we're sticking to an S3 bucket loaded with pdf pages.

I'll just hop on my Elastic Cloud deployment, go to Search->Content->Connectors, and make a new connector called aws-connector, with all the default settings. Then I'll open up the configuration and add the name of my bucket, and the secret key and access key tagged to my AWS user.

Run a quick sync to verify that everything is working okay. Synchronization will ingest every uningested file in your data source, extract its content, and store it as a unique document within your index. Each document will contain its original filename. Data source documents with the same filenames as existing indexed documents won't be reingested, so have no fear! Synchronization can also be regularly scheduled. The method is described in the documentation. If everything is working fine, assuming my AWS credentials and permissions are all in order, the data's going to go into an index called aws-connector.

Looks like it's all good. Let's grab our embedding model!

Uploading an embedding model

Eland is a Python Elasticsearch client which makes it easy to convert numpy, pandas, and scikit-learn functions to Elasticsearch powered equivalents. For our purposes, it will be our method of uploading models from HuggingFace, for deployment in our Elasticsearch cluster. You can install eland like so:

python -m pip install eland

Now get to a bash editor and make this little .sh script, filling out each parameter appropriately:

MODEL_ID="sentence-transformers/all-MiniLM-L6-v2"
ELASTIC_USERNAME="<YOUR ELASTIC DEPLOYMENT USERNAME>"
ELASTIC_PASSWORD="<YOUR ELASTIC DEPLOYMENT PASSWORD>"
CLOUD_ID="<YOUR CLOUD ID>"

eland_import_hub_model \
    --cloud-id $CLOUD_ID \
    --es-username $ELASTIC_USERNAME \
    --es-password $ELASTIC_PASSWORD \
    --hub-model-id $MODEL_ID \
    --task-type text_embedding \
    --start

MODEL_ID refers to a model taken from huggingface. I'm choosing all-MiniLM-L6-v2 mainly because it is very good, but also very small, and easily runnable on a CPU. Run the bash script, and once done, your model should appear in your Elastic deployment under Machine Learning -> Model Management -> Trained Models.

Just click the circled play button to deploy the model, and you're done.

Setting up your semantic_text index

Time to set up semantic search. Navigate to Management -> Dev Tools, and delete your index because it does not have the semantic_text datatype enabled.

DELETE aws-connector

Check the model_id of your uploaded model with:

GET _ml/trained_models

Now create an inference endpoint called minilm-l6, and pass it the correct model_id. Let's not worry about num_allocations and num_threads, because this isn't production and minilm-l6 is not a big-boy.

PUT _inference/text_embedding/minilm-l6
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "sentence-transformers__all-minilm-l6-v2" 
  }
}

Now recreate the aws-connector index. Set the "body" property as type "semantic_text", and add the id of your new inference endpoint.

PUT aws-connector
{
  "mappings": {
    "properties": {
      "body": {
        "type": "semantic_text",
        "inference_id": "minilm-l6"
      }
    }
  }
}

Get back to your connector and run another full-content sync (For real this time!). The incoming documents are going to be automatically chunked into blocks of 250 words, with an overlap of 100 words. You don't have to do anything explicitly. Now that's convenient!

And it's done. Check out your aws-connector index, there'll be 140 documents in there, each of which is now an embedded chunk:

Do RAG with the Elastic Playground

Scurry over to Search -> Build -> Playground and add an LLM connector of your choice. I'm using Azure OpenAI:

Now let's set up a chat experience. Click Add Data Sources and select aws-connector:

Check out the query tab of your new chat experience. Assuming everything was properly set up, it will automatically be set to this hybrid search query, with the model_id minilm-l6.

Let's ask a question! We'll take three documents for the context, and add my special RAG prompt:

Query: Describe the fall from Grace of Melkor

We'll use a relatively open-ended RAG query. To be answered satisfactorily, it will need to draw information from multiple parts of the text. This will be a good indicator of whether RAG is working as expected.

Summary
The fall from grace of Melkor is a central theme in the mythology of J.R.R. Tolkien's legendarium. According to the provided context, Melkor's descent into discord and rebellion can be summarized as follows:

1. **Initial Greatness and Ambition**:
   - Melkor was one of the Ainur, the powerful spirits created by Ilúvatar. He was endowed with the greatest gifts of power and knowledge among the Ainur, having a share in all the gifts of his brethren [2].
   - His ambition led him to seek the Imperishable Flame, desiring to bring into Being things of his own. However, he did not find the Flame, as it was with Ilúvatar [2].

2. **Introduction of Discord**:
   - During the Great Music, which was the Ainur's act of creation through song, Melkor began to interweave his own themes that were not in accord with Ilúvatar's original theme. This caused discord and disrupted the harmony of the music [2].
   - Melkor's discord spread, and many Ainur attuned their music to his, leading to a turbulent and chaotic sound. Despite this, Ilúvatar introduced new themes to counter Melkor's discord, but Melkor's influence continued to cause strife [2].

3. **Conflict and Rebellion**:
   - When the Ainur entered the World (Eä), Melkor continued to meddle in their work, turning it to his own desires and purposes. He kindled great fires and sought to claim the Earth as his own kingdom [3].
   - Manwë, who was the chief instrument of Ilúvatar's second theme, opposed Melkor and called upon other spirits to aid him in preventing Melkor from hindering their labor. This led to strife between Melkor and the other Valar [3].

4. **Continued Strife and Withdrawal**:
   - Despite being opposed by Manwë and the other Valar, Melkor did not abandon his desire for dominion over Arda (the Earth). He withdrew to other regions but continued to harbor the desire to rule [3].

In summary, Melkor's fall from grace was marked by his initial ambition and desire for power, his introduction of discord into the Great Music, his continued rebellion and meddling in the creation of the World, and his persistent desire to claim dominion over Arda despite opposition from the other Valar. His actions led to ongoing conflict and strife, setting the stage for much of the subsequent history and mythology of Middle-earth.

Well I'm convinced. It even has citations! One more for good luck:

Query: Who were the greatest students of Aule the Smith?

This particular query is nothing too difficult, I'm simply looking for a reference to a very specific quote from the text. Let's see how it does!

According to the provided context, the greatest students of Aulë the Smith were the Noldor. The text states, "The Noldor learned most of him, and he was ever their friend" [1]. This indicates that the Noldor, one of the clans of the Elves, were particularly skilled in crafts and knowledge due to their learning from Aulë.

Aulë, being a master of all crafts and a smith, imparted his knowledge and skills to the Noldor, making them renowned for their craftsmanship and artistry in Middle-earth. This relationship between Aulë and the Noldor is significant in the lore, as it highlights the transmission of knowledge and skill from a Vala to the Elves.

Well, that's correct. Looks like RAG is working just fine.

Conclusion

That was incredibly convenient and painless — hot damn! We're truly living in the future. I can definitely work with this. I hope you're as excited to try it as I am to show it off.

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself