Federated SharePoint searches with Azure OpenAI Service On your data

Using Azure OpenAI Service on your data with Elastic as vector database.

In this article, we are going to explore the Azure OpenAI service "On Your Data", using Elasticsearch as the data source. We will use the Elastic Sharepoint Online Native connector to index our Sharepoint documents and keep them on sync.

Let's imagine we have a Sharepoint site that has information about the company and employees, and we want to chat with it using a custom application. Designing and developing that architecture would usually take some time. You would have to take care of the ingestion, then set up a search engine, and then configure a RAG system that reads from the data source and passes the information to a LLM to answer the question. Luckily we can use Elastic and Azure to make this faster!

Steps

  1. Setting up the Sharepoint connector
  2. Setting up Azure OpenAI service
  3. Advanced Usage
  4. Document Level Security (DLS)

Setting up the Sharepoint connector

We will create a Sharepoint Site with the following documents:

Planet_Express.docx

This file contains information about the Planet Express company which is intended to be public.

PE_Payslip.docx

This one contains the Planet Express Payslip, specifically the CEO salary. This is some information we don't want everybody to have access to.

To ingest our Site documents into Elastic, and then keep them on sync when files are added or modified, we will use the Elastic Sharepoint Online connector. The first step is getting your Sharepoint environment ready. You can find detailed instructions about how to set it up here.

Once you have your Sharepoint Online App created and configured, you can go ahead and create the connector index in Elastic:

The next step is to vectorize the documents body using Kibana Content UI, so we can run vector search on them:

We are going to use the out-of-the-box e5-multilingual model from Elastic, but you can onboard any model compatible embeddings model you want, or use an external provider like OpenAI via Open Inference Service. You can also repeat the process, adding more fields if you want.

After configuring the connector index we can run a sync to start indexing the documents:

If everything went OK, you should start seeing your documents in the Documents tab:

By default, the connector will ingest different document types like lists, list items, and sites. For this article, we are only interested in documents so let's apply a filter in the connector for that purpose.

This filter will exclude lists and collection-related documents. You have to run a full content sync to apply the filter.

Setting up Azure OpenAI service

The easiest set up is to go to Azure AI Studio, and adding Elasticsearch as a Chat data source:

Select the customize mappings checkbox to align with the connector settings

We can select between Keyword or Vector let's start with Keyword.

Now we can align the mappings from the connector with the fields Azure will use for the queries.

We can start asking questions to our documents! Let's start asking about Planet Express:

We can start asking questions to our documents! Let's start asking about Planet Express:

What if we ask about the CEO salary?

We may don't want to make this information public.

Let's fix that!

Advanced usage

The Azure AI Studio chat is not the only way to use this service. Azure OpenAI On Your Data can be deployed to Copilot, Teams, or using an API/SDK. We will go with the latter.

Prerequisites:

  • Configure the role assignments from the user to the Azure OpenAI resource. Required role: Cognitive Services OpenAI User.
  • Install Az CLI and run az login. After selecting your subscription, a webpage will open to get you authenticated.
  • Define the following variables: AzureOpenAIEndpoint, ChatCompletionsDeploymentName, SearchEndpoint, IndexName, Key.

To find AzureOpenAIEndpoint, ChatCompletionsDeploymentName you can click the View Code tab in the chat:

Copy the endpoint and deployment values from here.

SearchEndpoint, IndexName, Key , are the Elasticsearch URL, index name and API Key.

We are going to use python. First install the required dependencies:

pip install openai azure-identity

Now add the values you gathered. You can also store them as environment variables:

import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

endpoint = os.getenv("ENDPOINT_URL", "<Endpoint_value>")
deployment = os.getenv("DEPLOYMENT_NAME", "<Deployment_value")
index_name = os.environ.get("IndexName", "sharepoint-labs")
search_endpoint = os.environ.get("SearchEndpoint", "<Elasticsearch_endpoint")
key = os.environ.get("Key", "<Elasticsearch_ApiKey>")

Now let's proceed to add the API call to the file:

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    azure_endpoint=endpoint,
    azure_ad_token_provider=token_provider,
    api_version="2024-02-15-preview",
)

completion = client.chat.completions.create(
    model=deployment,
    messages=[
        {
            "role": "user",
            "content": "What is the CEO salary?",
        },
    ],
    extra_body={
        "data_sources": [
            {
                "type": "elasticsearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "index_name": index_name,
                    "authentication": {
                        "type": "encoded_api_key",
                        "encoded_api_key": key
                    }
                },
                "query_type": "simple",
                "fields_mapping": {
                    "content_fields_separator": "\n",
                    "content_fields": [
                        "body"
                    ],
                    "filepath_field": "name",
                    "title_field": "Title",
                    "url_field": "webUrl",
                    "vector_fields": [
                        "ml.inference.body.predicted_value"
                    ]
                },
            }
        ]
    }
)

print(completion.model_dump_json(indent=2))

Run the script:

python myscript.py

We got the same answer.

{
    "id": "01b421b9-212c-4a95-b4b8-072bbd2972dc",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The CEO's salary is 1 jillion per month .",
                "refusal": null,
                "role": "assistant",
                "function_call": null,
                "tool_calls": null,
                "end_turn": true,
                "context": {
                    "citations": [
                        {
                            "content": "https://1fbkbs.sharepoint.com/_layouts/15/download.aspx?UniqueId=51621a8b-cede-4396-b42c-7f4bd54607b6&amp;Translate=false&amp;tempauth=v1.eyJzaXRlaWQiOiJiN2Q3NzBjMC03ZTgwLTQ5OTMtOTZjZC1hOGY3YjMxZWUyYmQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1sYWJzIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwLzFmYmticy5zaGFyZXBvaW50LmNvbUA5MTVkYzNkOS04NTI2LTRhODYtYTc4My02MDc1OTVkMzMxZjUiLCJleHAiOiIxNzI3OTY1MTczIn0.CgoKBHNuaWQSAjQ4EgsI-Jr32YHusT0QBRoNMjAuMTkwLjEzMi40MCosdW02Ym9VYzVMdTZGRXNuc1hrL2UwenA3QW1iWFRlM1BkQUovd2RTakNHbz0wdTgBQhChVktf74AAYI8XRwPrhkUMShBoYXNoZWRwcm9vZnRva2VuegExugE3Z3JvdXAucmVhZCBhbGxzaXRlcy5yZWFkIGFsbGZpbGVzLnJlYWQgYWxscHJvZmlsZXMucmVhZMIBSTIyNjk4YTdkLTRhZmQtNGJhNS1iMzMyLTNiMzA2NGRkYjFiNkA5MTVkYzNkOS04NTI2LTRhODYtYTc4My02MDc1OTVkMzMxZjXIAQE.kCyzpMNSnJKjdCpubfkQ_L7XvMZBFMseOjZQwHl_EEk\n#microsoft.graph.driveItem\nPlanet Express Interplanetary Payslip Employee name: Philip J. Fry Position: CEO Pay period: July 2024 Currency: Jillions PAYMENTS DEDUCTIONS YEAR TO DATE Basic Salary 1 jillion Taxes 0 Total pay to date: 1 jillion Taxable pay to date: 0 Tax paid to date: 0 THIS MONTH Gross pay: 1 jillion Income tax: 0 Total gross payments: 1 jillion Total deductions: 0 Net pay: 1 jillion\n01BV67HZ4LDJRFDXWOSZB3ILD7JPKUMB5W\nPE_payslip.docx\ndrive_item\nhttps://1fbkbs.sharepoint.com/_layouts/15/Doc.aspx?sourcedoc=%7B51621A8B-CEDE-4396-B42C-7F4BD54607B6%7D&amp;file=PE_payslip.docx&amp;action=default&amp;mobileredirect=true",
                            "title": null,
                            "url": null,
                            "filepath": null,
                            "chunk_id": "0"
                        },
                        {
                            "content": "https://1fbkbs.sharepoint.com/_layouts/15/download.aspx?UniqueId=d771ceaa-e47f-4fc1-bda1-64a1c55d6e48&amp;Translate=false&amp;tempauth=v1.eyJzaXRlaWQiOiJiN2Q3NzBjMC03ZTgwLTQ5OTMtOTZjZC1hOGY3YjMxZWUyYmQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1sYWJzIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwLzFmYmticy5zaGFyZXBvaW50LmNvbUA5MTVkYzNkOS04NTI2LTRhODYtYTc4My02MDc1OTVkMzMxZjUiLCJleHAiOiIxNzI3OTY1MTczIn0.CgoKBHNuaWQSAjQ4EgsI-Jr32YHusT0QBRoNMjAuMTkwLjEzMi40MCosc1NiQlBlQU9sZEVMWUUxMmVodnNTK3NSMmx4blJsOGoybGR0N1Zxeko5Zz0wdTgBQhChVktf74AAYI8XRwPrhkUMShBoYXNoZWRwcm9vZnRva2VuegExugE3Z3JvdXAucmVhZCBhbGxzaXRlcy5yZWFkIGFsbGZpbGVzLnJlYWQgYWxscHJvZmlsZXMucmVhZMIBSTIyNjk4YTdkLTRhZmQtNGJhNS1iMzMyLTNiMzA2NGRkYjFiNkA5MTVkYzNkOS04NTI2LTRhODYtYTc4My02MDc1OTVkMzMxZjXIAQE.eN2threRzN2AZvYmPTCsNsKy1x-MLV_RbDq_yzSexG8\n#microsoft.graph.driveItem\nPlanet Express Interplanetary Our Company Planet Express, Inc. is an intergalactic delivery company owned and operated by Professor Farnsworth to fund his research. Founded in 2961, its headquarters is located in New New York, and its crew includes many important characters of the series. The current crew reached their 100th delivery in September 3010, and to celebrate, Bender threw a 100th-delivery party. The inaugural delivery crew, which disappeared on its first interplanetary mission, was found alive in June 3011. The company scrapes by, in spite of fierce competition from the leader in package delivery, Mom's Friendly Delivery Company. They stay in business thanks to their complete disregard for safety and minimum wage laws, and the Professor's unscrupulous acceptance of the occasional bribe.\n01BV67HZ5KZZY5O77EYFH33ILEUHCV23SI\nPlanet_Express.docx\ndrive_item\nhttps://1fbkbs.sharepoint.com/_layouts/15/Doc.aspx?sourcedoc=%7BD771CEAA-E47F-4FC1-BDA1-64A1C55D6E48%7D&amp;file=Planet_Express.docx&amp;action=default&amp;mobileredirect=true",
                            "title": null,
                            "url": null,
                            "filepath": null,
                            "chunk_id": "0"
                        }
                    ],
                    "intent": "[\"CEO salary\", \"current CEO salary\", \"CEO compensation\"]"
                }
            }
        }
    ],
    "created": 1727985189,
    "model": "gpt-4o",
    "object": "extensions.chat.completion",
    "service_tier": null,
    "system_fingerprint": "fp_67802d9a6d",
    "usage": {
        "completion_tokens": 28,
        "prompt_tokens": 3658,
        "total_tokens": 3686,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    }
}

The difference is now we can override the Elasticsearch settings on a per request base.

Document Level Security (DLS)

How can we protect the documents? Elastic provides the tools to mirror the Sharepoint security permissions, so depending on who is asking, we can retrieve or not retrieve a document based on the permissions they have on Sharepoint to answer the question. We will use document level security (DLS) for this purpose.

In fact, the Payslip information is not shared with the site members and it's only available for site owners and administrators:

Let's start by running an Access Control Sync in the connector to populate the security index:

Now we can expect the security index to gather the permissions for each document/user.

Let's grab a user. Go to Kibana DevTools and run the following:

GET .search-acl-filter-sharepoint-labs/_search

Response:

{
    "_index": ".search-acl-filter-sharepoint-labs",
    "_id": "[email protected]",
    "_score": 1,
    "_source": {
      "created_at": "2024-07-16T07:28:22",
      "id": "[email protected]",
      "_timestamp": "2024-08-05T00:42:48.058411+00:00",
      "identity": {
        "user_id": "user_id:2f7a1527-da11-4738-ad9d-0f6be1acb6a7",
        "email": "email:[email protected]",
        "username": "user:[email protected]"
      },
      "query": {
        "template": {
          "source": """{
                "bool": {
                    "should": [
                        {
                            "bool": {
                                "must_not": {
                                    "exists": {
                                        "field": "_allow_access_control"
                                    }
                                }
                            }
                        },
                        {
                            "terms": {
                                "_allow_access_control.enum": {{#toJson}}access_control{{/toJson}}
                            }
                        }
                    ]
                }
            }""",
          "params": {
            "access_control": [
              "group:038fae1d-6ea3-485a-83b9-4362b54a14f5",
              "user_id:2f7a1527-da11-4738-ad9d-0f6be1acb6a7",
              "group:d11975c2-4fe8-45fd-9789-cbf37d4f115d",
              "group:c0c350fa-37b0-476a-829d-733800cfbeea",
              "group:70ddf71e-c04e-4202-b0ab-d4fd78921b72",
              "group:829ee542-eb93-40f5-9790-688457a2b0f5",
              "email:[email protected]",
              "user:[email protected]",
              "group:62ab5abe-bac2-4fc7-9b5f-92985b8ae69c"
            ]
          }
        }
      }
    }
  }

This user is a Site Member, so it's perfect to test permissions.

From here we can grab the query.template part from the previous response, and create an API Key for user LeeG executing the following:

POST /_security/api_key
{
  "name": "LeeG-api-key",
  "expiration": "30d",
  "role_descriptors": {
    "sharepoint-online-role": {
      "index": [
        {
          "names": [
            "sharepoint-labs"
            ],
            "privileges": [
              "read",
              "view_index_metadata"
              ],
              "query": {
                "template": {
                  "params": {
                  "access_control": [
                    "group:038fae1d-6ea3-485a-83b9-4362b54a14f5",
                    "user_id:2f7a1527-da11-4738-ad9d-0f6be1acb6a7",
                    "group:d11975c2-4fe8-45fd-9789-cbf37d4f115d",
                    "group:c0c350fa-37b0-476a-829d-733800cfbeea",
                    "group:70ddf71e-c04e-4202-b0ab-d4fd78921b72",
                    "group:829ee542-eb93-40f5-9790-688457a2b0f5",
                    "email:[email protected]",
                    "user:[email protected]",
                    "group:62ab5abe-bac2-4fc7-9b5f-92985b8ae69c"
                    ]
                  },
                  "source":"""{
                    "bool": {
                        "should": [
                            {
                                "bool": {
                                    "must_not": {
                                        "exists": {
                                            "field": "_allow_access_control"
                                        }
                                    }
                                }
                            },
                            {
                                "terms": {
                                    "_allow_access_control.enum": {{#toJson}}access_control{{/toJson}}
                                }
                            }
                        ]
                    }
                }"""
                }
              }
        }
        ]
    }
  }
}

The response will be the ApiKey with LeeG's group permissions:

{
  "id": "rpgMIJEBvlvLsU6BeL5O",
  "name": "LeeG-api-key",
  "expiration": 1725411573838,
  "api_key": "S3Q4XCNuTeu9fPITZNmLfA",
  "encoded": "cnBnTUlKRUJ2bHZMc1U2QmVMNU86UzNRNFhDTnVUZXU5ZlBJVFpObUxmQQ=="
}

From here grab the value of encoded to use in your future calls with Azure OpenAI On Your Data. If you use this ApiKey you will only see the documents LeeG's user has permissions in the sharepoint-labs connector index.

Let's try again, now using LeeG-api-key API Key:

completion = client.chat.completions.create(
    model=deployment,
    messages=[
        {
            "role": "user",
            "content": "What is the CEO Salary?",
        },
    ],
    extra_body={
        "data_sources": [
            {
                "type": "elasticsearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "index_name": index_name,
                    "authentication": {
                        "type": "encoded_api_key",
                        "encoded_api_key": key # Our new API Key goes here
                    }
                },
                "query_type": "simple",
                "fields_mapping": {
                    "content_fields_separator": "\n",
                    "content_fields": [
                        "body"
                    ],
                    "filepath_field": "name",
                    "title_field": "Title",
                    "url_field": "webUrl",
                    "vector_fields": [
                        "ml.inference.body.predicted_value"
                    ]
                },
            }
        ]
    }
)
print(completion.model_dump_json(indent=2))

Response:

{
    "id": "564eb9d5-5321-41d9-97c5-5abd9323b2d2",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "The requested information is not available in the retrieved data. Please try another query or topic.",
                "refusal": null,
                "role": "assistant",
                "function_call": null,
                "tool_calls": null,
                "end_turn": true,
                "context": {
                    "citations": [
                        {
                            "content": "https://1fbkbs.sharepoint.com/_layouts/15/download.aspx?UniqueId=d771ceaa-e47f-4fc1-bda1-64a1c55d6e48&amp;Translate=false&amp;tempauth=v1.eyJzaXRlaWQiOiJiN2Q3NzBjMC03ZTgwLTQ5OTMtOTZjZC1hOGY3YjMxZWUyYmQiLCJhcHBfZGlzcGxheW5hbWUiOiJzcC1sYWJzIiwiYXVkIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwLzFmYmticy5zaGFyZXBvaW50LmNvbUA5MTVkYzNkOS04NTI2LTRhODYtYTc4My02MDc1OTVkMzMxZjUiLCJleHAiOiIxNzI3OTY1MTczIn0.CgoKBHNuaWQSAjQ4EgsI-Jr32YHusT0QBRoNMjAuMTkwLjEzMi40MCosc1NiQlBlQU9sZEVMWUUxMmVodnNTK3NSMmx4blJsOGoybGR0N1Zxeko5Zz0wdTgBQhChVktf74AAYI8XRwPrhkUMShBoYXNoZWRwcm9vZnRva2VuegExugE3Z3JvdXAucmVhZCBhbGxzaXRlcy5yZWFkIGFsbGZpbGVzLnJlYWQgYWxscHJvZmlsZXMucmVhZMIBSTIyNjk4YTdkLTRhZmQtNGJhNS1iMzMyLTNiMzA2NGRkYjFiNkA5MTVkYzNkOS04NTI2LTRhODYtYTc4My02MDc1OTVkMzMxZjXIAQE.eN2threRzN2AZvYmPTCsNsKy1x-MLV_RbDq_yzSexG8\n#microsoft.graph.driveItem\nPlanet Express Interplanetary Our Company Planet Express, Inc. is an intergalactic delivery company owned and operated by Professor Farnsworth to fund his research. Founded in 2961, its headquarters is located in New New York, and its crew includes many important characters of the series. The current crew reached their 100th delivery in September 3010, and to celebrate, Bender threw a 100th-delivery party. The inaugural delivery crew, which disappeared on its first interplanetary mission, was found alive in June 3011. The company scrapes by, in spite of fierce competition from the leader in package delivery, Mom's Friendly Delivery Company. They stay in business thanks to their complete disregard for safety and minimum wage laws, and the Professor's unscrupulous acceptance of the occasional bribe.\n01BV67HZ5KZZY5O77EYFH33ILEUHCV23SI\nPlanet_Express.docx\ndrive_item\nhttps://1fbkbs.sharepoint.com/_layouts/15/Doc.aspx?sourcedoc=%7BD771CEAA-E47F-4FC1-BDA1-64A1C55D6E48%7D&amp;file=Planet_Express.docx&amp;action=default&amp;mobileredirect=true",
                            "title": null,
                            "url": null,
                            "filepath": null,
                            "chunk_id": "0"
                        }
                    ],
                    "intent": "[\"CEO salary\", \"current CEO salary\", \"CEO compensation\"]"
                }
            }
        }
    ],
    "created": 1727985521,
    "model": "gpt-4o",
    "object": "extensions.chat.completion",
    "service_tier": null,
    "system_fingerprint": "fp_67802d9a6d",
    "usage": {
        "completion_tokens": 31,
        "prompt_tokens": 2952,
        "total_tokens": 2983,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
    }
}

Great!, now the CEO private documents are protected.

We can try another question, now related to a document Lee can see, like the Planet Express info document.

Try running the python file again, but now switching the message to:

{
  "role": "user",
  "content": "What is Planet Express?"
}

Answer:

"Planet Express, Inc. is an intergalactic delivery company owned and operated by Professor Farnsworth to fund his research. Founded in 2961, its headquarters is located in New New York. The company has a crew that includes many important characters from the series it is featured in. Despite fierce competition from the leading package delivery company, Mom's Friendly Delivery Company, Planet Express manages to stay in business by disregarding safety and minimum wage laws and occasionally accepting bribes..."

We stored our documents as vectors too, so we can leverage the Azure Vector query type. You can select the query type to Vector in the UI:

As you can see, it will automatically detect the model from Elasticsearch.

It will also detect the vector field automatically.You must fill the rest of the fields for citations.

Or use the SDK this way:

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    azure_endpoint=endpoint,
    azure_ad_token_provider=token_provider,
    api_version="2024-02-15-preview",
)

completion = client.chat.completions.create(
    model=deployment,
    messages=[
        {
            "role": "user",
            "content": "What is Planet Express?",
        },
    ],
    extra_body={
        "data_sources": [
            {
                "type": "elasticsearch",
                "parameters": {
                    "endpoint": search_endpoint,
                    "index_name": index_name,
                    "authentication": {
                        "type": "encoded_api_key",
                        "encoded_api_key": key
                    }
                },
                "query_type": "vector",
                "embedding_dependency": {
                  "type": "model_id",
                  "model_id": ".multilingual-e5-small_linux-x86_64"
                },
                "fields_mapping": {
                    "content_fields_separator": "\n",
                    "content_fields": [
                        "body"
                    ],
                    "filepath_field": "name",
                    "title_field": "Title",
                    "url_field": "webUrl",
                    "vector_fields": [
                        "ml.inference.body.predicted_value"
                    ]
                },
            }
        ]
    }
)

print(completion.model_dump_json(indent=2))

Conclusion

Azure OpenAI "On Your Data" service is a tool that quickly allows you to chat with your data without needing to train or fine-tune models. Together with the Elastic SharePoint connector, it allows you to keep control of who has access to your data, to prevent any security breach, making them a great combo for grounded, chat questions.

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself