教程:使用 semantic_text 进行语义搜索
编辑教程:使用 semantic_text
编辑此功能处于 Beta 测试阶段,可能会发生更改。其设计和代码不如正式 GA 功能成熟,并且按“原样”提供,不提供任何保证。Beta 功能不受官方 GA 功能的支持 SLA 的约束。
在 Elastic Stack 中使用语义搜索的推荐方法是遵循 semantic_text
本教程使用 elasticsearch
服务进行演示,但您可以使用任何服务及其推理 API 提供的支持模型。
编辑本教程使用 elasticsearch
服务进行演示,该服务会在需要时自动创建。要将 semantic_text
字段类型与 elasticsearch
服务以外的推理服务一起使用,您必须使用创建推理 API创建推理端点。
编辑必须创建目标索引的映射,该索引包含推理端点将根据您的输入文本生成的嵌入。目标索引必须有一个字段,其字段类型为 semantic_text
const response = await client.indices.create({ index: "semantic-embeddings", mappings: { properties: { content: { type: "semantic_text", }, }, }, }); console.log(response);
PUT semantic-embeddings { "mappings": { "properties": { "content": { "type": "semantic_text" } } } }
包含生成的嵌入的字段名称。 |
包含嵌入的字段是一个 |
如果您使用网络爬虫或连接器来生成索引,则必须更新这些索引的索引映射,以包含 semantic_text
使用 msmarco-passagetest2019-top1000
数据集,该数据集是 MS MARCO Passage Ranking 数据集的子集。它包含 200 个查询,每个查询都附带一个相关文本段落列表。所有唯一的段落及其 ID 都已从该数据集中提取并编译到 tsv 文件中。
下载该文件并使用机器学习 UI 中的数据可视化工具将其上传到您的集群。分析您的数据后,单击覆盖设置。在编辑字段名称下,将 id
分配给第一列,将 content
分配给第二列。单击应用,然后单击导入。将索引命名为 test-data
,然后单击导入。上传完成后,您将看到一个名为 test-data
的索引,其中包含 182,469 个文档。
编辑通过将数据从 test-data
索引重新索引到 semantic-embeddings
字段中的数据将被重新索引到目标索引的 content
语义文本字段中。重新索引的数据将由与 content
resp = client.reindex( wait_for_completion=False, source={ "index": "test-data", "size": 10 }, dest={ "index": "semantic-embeddings" }, ) print(resp)
const response = await client.reindex({ wait_for_completion: "false", source: { index: "test-data", size: 10, }, dest: { index: "semantic-embeddings", }, }); console.log(response);
POST _reindex?wait_for_completion=false { "source": { "index": "test-data", "size": 10 }, "dest": { "index": "semantic-embeddings" } }
该调用返回一个任务 ID 来监控进度
resp = client.tasks.get( task_id="<task_id>", ) print(resp)
const response = await client.tasks.get({ task_id: "<task_id>", }); console.log(response);
GET _tasks/<task_id>
重新索引大型数据集可能需要很长时间。您可以使用数据集的子集来测试此工作流程。为此,请取消重新索引过程,并且仅为重新索引的子集生成嵌入。以下 API 请求将取消重新索引任务
resp = client.tasks.cancel( task_id="<task_id>", ) print(resp)
const response = await client.tasks.cancel({ task_id: "<task_id>", }); console.log(response);
POST _tasks/<task_id>/_cancel
编辑使用嵌入丰富数据集后,您可以使用语义搜索查询数据。在 semantic
查询类型中提供 semantic_text
字段名称和查询文本。用于为 semantic_text
resp = client.search( index="semantic-embeddings", query={ "semantic": { "field": "content", "query": "How to avoid muscle soreness while running?" } }, ) print(resp)
const response = await client.search({ index: "semantic-embeddings", query: { semantic: { field: "content", query: "How to avoid muscle soreness while running?", }, }, }); console.log(response);
GET semantic-embeddings/_search { "query": { "semantic": { "field": "content", "query": "How to avoid muscle soreness while running?" } } }
因此,您会收到 semantic-embedding
索引中与查询含义最接近的前 10 个文档
"hits": [ { "_index": "semantic-embeddings", "_id": "Jy5065EBBFPLbFsdh_f9", "_score": 21.487484, "_source": { "id": 8836652, "content": { "text": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement.", "inference": { "inference_id": "my-elser-endpoint", "model_settings": { "task_type": "sparse_embedding" }, "chunks": [ { "text": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement.", "embeddings": { (...) } } ] } } } }, { "_index": "semantic-embeddings", "_id": "Ji5065EBBFPLbFsdh_f9", "_score": 18.211695, "_source": { "id": 8836651, "content": { "text": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum.", "inference": { "inference_id": "my-elser-endpoint", "model_settings": { "task_type": "sparse_embedding" }, "chunks": [ { "text": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum.", "embeddings": { (...) } } ] } } } }, { "_index": "semantic-embeddings", "_id": "Wi5065EBBFPLbFsdh_b9", "_score": 13.089405, "_source": { "id": 8800197, "content": { "text": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore.", "inference": { "inference_id": "my-elser-endpoint", "model_settings": { "task_type": "sparse_embedding" }, "chunks": [ { "text": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore.", "embeddings": { (...) } } ] } } } } ]