教程:使用推理 API 进行语义搜索
编辑教程:使用推理 API 进行语义搜索编辑
本教程中的说明将向您展示如何使用推理 API 与各种服务一起对您的数据执行语义搜索。以下示例使用 Cohere 的 embed-english-v3.0
模型、来自 HuggingFace 的 all-mpnet-base-v2
模型以及 OpenAI 的 text-embedding-ada-002
第二代嵌入模型。您可以使用任何 Cohere 和 OpenAI 模型,它们都受推理 API 支持。有关 HuggingFace 上支持的模型列表,请参阅 支持的模型列表。
单击您要使用的任何小部件下面的服务名称以查看相应的说明。
要求编辑
需要一个 Cohere 帐户 才能使用推理 API 与 Cohere 服务一起使用。
需要一个 HuggingFace 帐户 才能使用推理 API 与 HuggingFace 服务一起使用。
需要一个 OpenAI 帐户 才能使用推理 API 与 OpenAI 服务一起使用。
- 一个 Azure 订阅
- 在所需的 Azure 订阅中授予对 Azure OpenAI 的访问权限。您可以通过填写 https://aka.ms/oai/access 上的表格来申请访问 Azure OpenAI。
- 在 Azure OpenAI Studio 中部署的嵌入模型。
创建推理端点编辑
使用 创建推理 API 创建推理端点。
PUT _inference/text_embedding/cohere_embeddings { "service": "cohere", "service_settings": { "api_key": "<api_key>", "model_id": "embed-english-v3.0", "embedding_type": "byte" } }
任务类型是路径中的 |
|
您 Cohere 帐户的 API 密钥。您可以在 Cohere 仪表板的 API 密钥部分 中找到您的 API 密钥。您只需要提供一次 API 密钥。 获取推理 API 不会返回您的 API 密钥。 |
|
要使用的嵌入模型的名称。您可以在此处找到 Cohere 嵌入模型列表 此处。 |
使用此模型时,在 dense_vector
字段映射中使用的推荐相似度度量是 dot_product
。在 Cohere 模型的情况下,嵌入被归一化为单位长度,在这种情况下,dot_product
和 cosine
度量是等效的。
首先,您需要在 Hugging Face 端点页面 上创建一个新的推理端点以获取端点 URL。在新的端点创建页面上选择模型 all-mpnet-base-v2
,然后在高级配置部分下选择 Sentence Embeddings
任务。创建端点。端点初始化完成后,复制 URL,您需要在以下推理 API 调用中使用该 URL。
PUT _inference/text_embedding/hugging_face_embeddings { "service": "hugging_face", "service_settings": { "api_key": "<access_token>", "url": "<url_endpoint>" } }
任务类型是路径中的 |
|
有效的 HuggingFace 访问令牌。您可以在 帐户的设置页面 上找到它。 |
|
您在 Hugging Face 上创建的推理端点 URL。 |
PUT _inference/text_embedding/openai_embeddings { "service": "openai", "service_settings": { "api_key": "<api_key>", "model_id": "text-embedding-ada-002" } }
任务类型是路径中的 |
|
您 OpenAI 帐户的 API 密钥。您可以在 OpenAI 帐户的 API 密钥部分 中找到您的 OpenAI API 密钥。您只需要提供一次 API 密钥。 获取推理 API 不会返回您的 API 密钥。 |
|
要使用的嵌入模型的名称。您可以在此处找到 OpenAI 嵌入模型列表 此处。 |
使用此模型时,在 dense_vector
字段映射中使用的推荐相似度度量是 dot_product
。在 OpenAI 模型的情况下,嵌入被归一化为单位长度,在这种情况下,dot_product
和 cosine
度量是等效的。
PUT _inference/text_embedding/azure_openai_embeddings { "service": "azureopenai", "service_settings": { "api_key": "<api_key>", "resource_name": "<resource_name>", "deployment_id": "<deployment_id>", "api_version": "2024-02-01" } }
任务类型是路径中的 |
|
访问 Azure OpenAI 服务的 API 密钥。或者,您可以在此处提供 |
|
您 Azure 资源的名称。 |
|
您已部署模型的 ID。 |
使用此模型时,在 dense_vector
字段映射中使用的推荐相似度度量是 dot_product
。在 Azure OpenAI 模型的情况下,嵌入被归一化为单位长度,在这种情况下,dot_product
和 cosine
度量是等效的。
创建索引映射编辑
必须创建目标索引的映射 - 包含模型将根据您的输入文本创建的嵌入的索引。目标索引必须具有一个使用 dense_vector
字段类型来索引所用模型输出的字段。
PUT cohere-embeddings { "mappings": { "properties": { "content_embedding": { "type": "dense_vector", "dims": 1024, "element_type": "byte" }, "content": { "type": "text" } } } }
包含生成令牌的字段的名称。它必须在下一步的推理管道配置中引用。 |
|
包含令牌的字段是一个 |
|
模型的输出维度。在您使用的模型的 Cohere 文档 中找到此值。 |
|
用于创建密集向量表示的字段的名称。在本例中,字段的名称为 |
|
在本例中为文本的字段类型。 |
PUT hugging-face-embeddings { "mappings": { "properties": { "content_embedding": { "type": "dense_vector", "dims": 768, "element_type": "float" }, "content": { "type": "text" } } } }
包含生成令牌的字段的名称。它必须在下一步的推理管道配置中引用。 |
|
包含令牌的字段是一个 |
|
模型的输出维度。在您使用的 HuggingFace 模型文档 中找到此值。 |
|
用于创建密集向量表示的字段的名称。在本例中,字段的名称为 |
|
在本例中为文本的字段类型。 |
PUT azure-openai-embeddings { "mappings": { "properties": { "content_embedding": { "type": "dense_vector", "dims": 1536, "element_type": "float", "similarity": "dot_product" }, "content": { "type": "text" } } } }
包含生成令牌的字段的名称。它必须在下一步的推理管道配置中引用。 |
|
包含令牌的字段是一个 |
|
模型的输出维度。在您使用的 Azure OpenAI 文档 中找到此值。 |
|
对于 Azure OpenAI 嵌入,应使用 |
|
用于创建密集向量表示的字段的名称。在本例中,字段的名称为 |
|
在本例中为文本的字段类型。 |
使用推理处理器创建摄取管道编辑
使用 推理处理器 创建一个 摄取管道,并使用您在上面创建的模型来推断管道中正在摄取的数据。
PUT _ingest/pipeline/cohere_embeddings { "processors": [ { "inference": { "model_id": "cohere_embeddings", "input_output": { "input_field": "content", "output_field": "content_embedding" } } } ] }
使用 创建推理 API 创建的推理端点的名称,在该步骤中称为 |
|
定义推理过程的 |
PUT _ingest/pipeline/hugging_face_embeddings { "processors": [ { "inference": { "model_id": "hugging_face_embeddings", "input_output": { "input_field": "content", "output_field": "content_embedding" } } } ] }
使用 创建推理 API 创建的推理端点的名称,在该步骤中称为 |
|
定义推理过程的 |
response = client.ingest.put_pipeline( id: 'openai_embeddings', body: { processors: [ { inference: { model_id: 'openai_embeddings', input_output: { input_field: 'content', output_field: 'content_embedding' } } } ] } ) puts response
PUT _ingest/pipeline/openai_embeddings { "processors": [ { "inference": { "model_id": "openai_embeddings", "input_output": { "input_field": "content", "output_field": "content_embedding" } } } ] }
使用 创建推理 API 创建的推理端点的名称,在该步骤中称为 |
|
定义推理过程的 |
PUT _ingest/pipeline/azure_openai_embeddings { "processors": [ { "inference": { "model_id": "azure_openai_embeddings", "input_output": { "input_field": "content", "output_field": "content_embedding" } } } ] }
使用 创建推理 API 创建的推理端点的名称,在该步骤中称为 |
|
定义推理过程的 |
加载数据编辑
在此步骤中,您将加载稍后在推理摄取管道中使用的数据,以从中创建嵌入。
使用 msmarco-passagetest2019-top1000
数据集,它是 MS MARCO 段落排名数据集的子集。它包含 200 个查询,每个查询都附带一个相关文本段落的列表。所有唯一的段落及其 ID 已从该数据集中提取并编译到一个 tsv 文件 中。
下载文件并使用机器学习 UI 中的 数据可视化器 将其上传到您的集群。将第一个列的名称指定为 id
,第二个列的名称指定为 content
。索引名称为 test-data
。上传完成后,您将看到一个名为 test-data
的索引,其中包含 182469 个文档。
通过推理摄取管道摄取数据编辑
通过使用所选模型作为推理模型的推理管道重新索引数据,从文本中创建嵌入。
POST _reindex?wait_for_completion=false { "source": { "index": "test-data", "size": 50 }, "dest": { "index": "cohere-embeddings", "pipeline": "cohere_embeddings" } }
您的 Cohere 帐户的 速率限制 可能会影响重新索引过程的吞吐量。
response = client.reindex( wait_for_completion: false, body: { source: { index: 'test-data', size: 50 }, dest: { index: 'openai-embeddings', pipeline: 'openai_embeddings' } } ) puts response
POST _reindex?wait_for_completion=false { "source": { "index": "test-data", "size": 50 }, "dest": { "index": "openai-embeddings", "pipeline": "openai_embeddings" } }
您的 OpenAI 帐户的 速率限制 可能会影响重新索引过程的吞吐量。如果发生这种情况,请将 size
更改为 3
或类似大小的值。
POST _reindex?wait_for_completion=false { "source": { "index": "test-data", "size": 50 }, "dest": { "index": "azure-openai-embeddings", "pipeline": "azure_openai_embeddings" } }
您的 Azure OpenAI 帐户的 速率限制 可能会影响重新索引过程的吞吐量。如果发生这种情况,请将 size
更改为 3
或类似大小的值。
该调用返回一个任务 ID 以监控进度
GET _tasks/<task_id>
您也可以取消重新索引过程,如果您不想等到重新索引过程完全完成,这对于大型数据集可能需要几个小时。
POST _tasks/<task_id>/_cancel
语义搜索编辑
在数据集通过嵌入进行丰富后,您可以使用 语义搜索 查询数据。将 query_vector_builder
传递给 k 最近邻 (kNN) 向量搜索 API,并提供查询文本和用于创建嵌入的模型。
如果您取消了重新索引过程,您只对数据的一部分运行查询,这会影响结果的质量。
GET cohere-embeddings/_search { "knn": { "field": "content_embedding", "query_vector_builder": { "text_embedding": { "model_id": "cohere_embeddings", "model_text": "Muscles in human body" } }, "k": 10, "num_candidates": 100 }, "_source": [ "id", "content" ] }
结果,您将收到来自 cohere-embeddings
索引的与查询意义最接近的 10 个文档,按它们与查询的接近程度排序。
"hits": [ { "_index": "cohere-embeddings", "_id": "-eFWCY4BECzWLnMZuI78", "_score": 0.737484, "_source": { "id": 1690948, "content": "Oxygen is supplied to the muscles via red blood cells. Red blood cells carry hemoglobin which oxygen bonds with as the hemoglobin rich blood cells pass through the blood vessels of the lungs.The now oxygen rich blood cells carry that oxygen to the cells that are demanding it, in this case skeletal muscle cells.ther ways in which muscles are supplied with oxygen include: 1 Blood flow from the heart is increased. 2 Blood flow to your muscles in increased. 3 Blood flow from nonessential organs is transported to working muscles." } }, { "_index": "cohere-embeddings", "_id": "HuFWCY4BECzWLnMZuI_8", "_score": 0.7176013, "_source": { "id": 1692482, "content": "The thoracic cavity is separated from the abdominal cavity by the diaphragm. This is a broad flat muscle. (muscular) diaphragm The diaphragm is a muscle that separat…e the thoracic from the abdominal cavity. The pelvis is the lowest part of the abdominal cavity and it has no physical separation from it Diaphragm." } }, { "_index": "cohere-embeddings", "_id": "IOFWCY4BECzWLnMZuI_8", "_score": 0.7154432, "_source": { "id": 1692489, "content": "Muscular Wall Separating the Abdominal and Thoracic Cavities; Thoracic Cavity of a Fetal Pig; In Mammals the Diaphragm Separates the Abdominal Cavity from the" } }, { "_index": "cohere-embeddings", "_id": "C-FWCY4BECzWLnMZuI_8", "_score": 0.695313, "_source": { "id": 1691493, "content": "Burning, aching, tenderness and stiffness are just some descriptors of the discomfort you may feel in the muscles you exercised one to two days ago.For the most part, these sensations you experience after exercise are collectively known as delayed onset muscle soreness.urning, aching, tenderness and stiffness are just some descriptors of the discomfort you may feel in the muscles you exercised one to two days ago." } }, (...) ]
GET hugging-face-embeddings/_search { "knn": { "field": "content_embedding", "query_vector_builder": { "text_embedding": { "model_id": "hugging_face_embeddings", "model_text": "What's margin of error?" } }, "k": 10, "num_candidates": 100 }, "_source": [ "id", "content" ] }
结果,您将收到来自 hugging-face-embeddings
索引的与查询意义最接近的 10 个文档,按它们与查询的接近程度排序。
"hits": [ { "_index": "hugging-face-embeddings", "_id": "ljEfo44BiUQvMpPgT20E", "_score": 0.8522128, "_source": { "id": 7960255, "content": "The margin of error can be defined by either of the following equations. Margin of error = Critical value x Standard deviation of the statistic. Margin of error = Critical value x Standard error of the statistic. If you know the standard deviation of the statistic, use the first equation to compute the margin of error. Otherwise, use the second equation. Previously, we described how to compute the standard deviation and standard error." } }, { "_index": "hugging-face-embeddings", "_id": "lzEfo44BiUQvMpPgT20E", "_score": 0.7865497, "_source": { "id": 7960259, "content": "1 y ou are told only the size of the sample and are asked to provide the margin of error for percentages which are not (yet) known. 2 This is typically the case when you are computing the margin of error for a survey which is going to be conducted in the future." } }, { "_index": "hugging-face-embeddings1", "_id": "DjEfo44BiUQvMpPgT20E", "_score": 0.6229427, "_source": { "id": 2166183, "content": "1. In general, the point at which gains equal losses. 2. In options, the market price that a stock must reach for option buyers to avoid a loss if they exercise. For a call, it is the strike price plus the premium paid. For a put, it is the strike price minus the premium paid." } }, { "_index": "hugging-face-embeddings1", "_id": "VzEfo44BiUQvMpPgT20E", "_score": 0.6034223, "_source": { "id": 2173417, "content": "How do you find the area of a circle? Can you measure the area of a circle and use that to find a value for Pi?" } }, (...) ]
response = client.search( index: 'openai-embeddings', body: { knn: { field: 'content_embedding', query_vector_builder: { text_embedding: { model_id: 'openai_embeddings', model_text: 'Calculate fuel cost' } }, k: 10, num_candidates: 100 }, _source: [ 'id', 'content' ] } ) puts response
GET openai-embeddings/_search { "knn": { "field": "content_embedding", "query_vector_builder": { "text_embedding": { "model_id": "openai_embeddings", "model_text": "Calculate fuel cost" } }, "k": 10, "num_candidates": 100 }, "_source": [ "id", "content" ] }
结果,您将收到来自 openai-embeddings
索引的与查询意义最接近的 10 个文档,按它们与查询的接近程度排序。
"hits": [ { "_index": "openai-embeddings", "_id": "DDd5OowBHxQKHyc3TDSC", "_score": 0.83704096, "_source": { "id": 862114, "body": "How to calculate fuel cost for a road trip. By Tara Baukus Mello • Bankrate.com. Dear Driving for Dollars, My family is considering taking a long road trip to finish off the end of the summer, but I'm a little worried about gas prices and our overall fuel cost.It doesn't seem easy to calculate since we'll be traveling through many states and we are considering several routes.y family is considering taking a long road trip to finish off the end of the summer, but I'm a little worried about gas prices and our overall fuel cost. It doesn't seem easy to calculate since we'll be traveling through many states and we are considering several routes." } }, { "_index": "openai-embeddings", "_id": "ajd5OowBHxQKHyc3TDSC", "_score": 0.8345704, "_source": { "id": 820622, "body": "Home Heating Calculator. Typically, approximately 50% of the energy consumed in a home annually is for space heating. When deciding on a heating system, many factors will come into play: cost of fuel, installation cost, convenience and life style are all important.This calculator can help you estimate the cost of fuel for different heating appliances.hen deciding on a heating system, many factors will come into play: cost of fuel, installation cost, convenience and life style are all important. This calculator can help you estimate the cost of fuel for different heating appliances." } }, { "_index": "openai-embeddings", "_id": "Djd5OowBHxQKHyc3TDSC", "_score": 0.8327426, "_source": { "id": 8202683, "body": "Fuel is another important cost. This cost will depend on your boat, how far you travel, and how fast you travel. A 33-foot sailboat traveling at 7 knots should be able to travel 300 miles on 50 gallons of diesel fuel.If you are paying $4 per gallon, the trip would cost you $200.Most boats have much larger gas tanks than cars.uel is another important cost. This cost will depend on your boat, how far you travel, and how fast you travel. A 33-foot sailboat traveling at 7 knots should be able to travel 300 miles on 50 gallons of diesel fuel." } }, (...) ]
GET azure-openai-embeddings/_search { "knn": { "field": "content_embedding", "query_vector_builder": { "text_embedding": { "model_id": "azure_openai_embeddings", "model_text": "Calculate fuel cost" } }, "k": 10, "num_candidates": 100 }, "_source": [ "id", "content" ] }
结果,您将收到来自 openai-embeddings
索引的与查询意义最接近的 10 个文档,按它们与查询的接近程度排序。
"hits": [ { "_index": "azure-openai-embeddings", "_id": "DDd5OowBHxQKHyc3TDSC", "_score": 0.83704096, "_source": { "id": 862114, "body": "How to calculate fuel cost for a road trip. By Tara Baukus Mello • Bankrate.com. Dear Driving for Dollars, My family is considering taking a long road trip to finish off the end of the summer, but I'm a little worried about gas prices and our overall fuel cost.It doesn't seem easy to calculate since we'll be traveling through many states and we are considering several routes.y family is considering taking a long road trip to finish off the end of the summer, but I'm a little worried about gas prices and our overall fuel cost. It doesn't seem easy to calculate since we'll be traveling through many states and we are considering several routes." } }, { "_index": "azure-openai-embeddings", "_id": "ajd5OowBHxQKHyc3TDSC", "_score": 0.8345704, "_source": { "id": 820622, "body": "Home Heating Calculator. Typically, approximately 50% of the energy consumed in a home annually is for space heating. When deciding on a heating system, many factors will come into play: cost of fuel, installation cost, convenience and life style are all important.This calculator can help you estimate the cost of fuel for different heating appliances.hen deciding on a heating system, many factors will come into play: cost of fuel, installation cost, convenience and life style are all important. This calculator can help you estimate the cost of fuel for different heating appliances." } }, { "_index": "azure-openai-embeddings", "_id": "Djd5OowBHxQKHyc3TDSC", "_score": 0.8327426, "_source": { "id": 8202683, "body": "Fuel is another important cost. This cost will depend on your boat, how far you travel, and how fast you travel. A 33-foot sailboat traveling at 7 knots should be able to travel 300 miles on 50 gallons of diesel fuel.If you are paying $4 per gallon, the trip would cost you $200.Most boats have much larger gas tanks than cars.uel is another important cost. This cost will depend on your boat, how far you travel, and how fast you travel. A 33-foot sailboat traveling at 7 knots should be able to travel 300 miles on 50 gallons of diesel fuel." } }, (...) ]