Join 字段类型
编辑Join 字段类型
编辑join
数据类型是一种特殊字段,用于在同一索引的文档中创建父/子关系。relations
部分定义了文档中可能存在的一组关系,每个关系都是一个父名称和一个子名称。
我们不建议使用多层关系来复制关系模型。每层关系都会在查询时增加内存和计算方面的开销。为了获得更好的搜索性能,请对数据进行去规范化处理。
父/子关系可以定义如下:
resp = client.indices.create( index="my-index-000001", mappings={ "properties": { "my_id": { "type": "keyword" }, "my_join_field": { "type": "join", "relations": { "question": "answer" } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { my_id: { type: 'keyword' }, my_join_field: { type: 'join', relations: { question: 'answer' } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", mappings: { properties: { my_id: { type: "keyword", }, my_join_field: { type: "join", relations: { question: "answer", }, }, }, }, }); console.log(response);
PUT my-index-000001 { "mappings": { "properties": { "my_id": { "type": "keyword" }, "my_join_field": { "type": "join", "relations": { "question": "answer" } } } } }
要使用 join 索引文档,必须在 source
中提供关系名称和文档的可选父项。例如,以下示例在 question
上下文中创建两个 parent
文档
resp = client.index( index="my-index-000001", id="1", refresh=True, document={ "my_id": "1", "text": "This is a question", "my_join_field": { "name": "question" } }, ) print(resp) resp1 = client.index( index="my-index-000001", id="2", refresh=True, document={ "my_id": "2", "text": "This is another question", "my_join_field": { "name": "question" } }, ) print(resp1)
response = client.index( index: 'my-index-000001', id: 1, refresh: true, body: { my_id: '1', text: 'This is a question', my_join_field: { name: 'question' } } ) puts response response = client.index( index: 'my-index-000001', id: 2, refresh: true, body: { my_id: '2', text: 'This is another question', my_join_field: { name: 'question' } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 1, refresh: "true", document: { my_id: "1", text: "This is a question", my_join_field: { name: "question", }, }, }); console.log(response); const response1 = await client.index({ index: "my-index-000001", id: 2, refresh: "true", document: { my_id: "2", text: "This is another question", my_join_field: { name: "question", }, }, }); console.log(response1);
PUT my-index-000001/_doc/1?refresh { "my_id": "1", "text": "This is a question", "my_join_field": { "name": "question" } } PUT my-index-000001/_doc/2?refresh { "my_id": "2", "text": "This is another question", "my_join_field": { "name": "question" } }
在索引父文档时,可以选择仅指定关系名称作为快捷方式,而不是将其封装在正常的对象表示法中
resp = client.index( index="my-index-000001", id="1", refresh=True, document={ "my_id": "1", "text": "This is a question", "my_join_field": "question" }, ) print(resp) resp1 = client.index( index="my-index-000001", id="2", refresh=True, document={ "my_id": "2", "text": "This is another question", "my_join_field": "question" }, ) print(resp1)
const response = await client.index({ index: "my-index-000001", id: 1, refresh: "true", document: { my_id: "1", text: "This is a question", my_join_field: "question", }, }); console.log(response); const response1 = await client.index({ index: "my-index-000001", id: 2, refresh: "true", document: { my_id: "2", text: "This is another question", my_join_field: "question", }, }); console.log(response1);
PUT my-index-000001/_doc/1?refresh { "my_id": "1", "text": "This is a question", "my_join_field": "question" } PUT my-index-000001/_doc/2?refresh { "my_id": "2", "text": "This is another question", "my_join_field": "question" }
索引子文档时,必须在 _source
中添加关系名称以及文档的父 ID。
需要将父项的谱系索引到同一个分片中,因此必须始终使用其更大的父 ID 来路由子文档。
例如,以下示例显示如何索引两个 child
文档
resp = client.index( index="my-index-000001", id="3", routing="1", refresh=True, document={ "my_id": "3", "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } }, ) print(resp) resp1 = client.index( index="my-index-000001", id="4", routing="1", refresh=True, document={ "my_id": "4", "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } }, ) print(resp1)
response = client.index( index: 'my-index-000001', id: 3, routing: 1, refresh: true, body: { my_id: '3', text: 'This is an answer', my_join_field: { name: 'answer', parent: '1' } } ) puts response response = client.index( index: 'my-index-000001', id: 4, routing: 1, refresh: true, body: { my_id: '4', text: 'This is another answer', my_join_field: { name: 'answer', parent: '1' } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 3, routing: 1, refresh: "true", document: { my_id: "3", text: "This is an answer", my_join_field: { name: "answer", parent: "1", }, }, }); console.log(response); const response1 = await client.index({ index: "my-index-000001", id: 4, routing: 1, refresh: "true", document: { my_id: "4", text: "This is another answer", my_join_field: { name: "answer", parent: "1", }, }, }); console.log(response1);
PUT my-index-000001/_doc/3?routing=1&refresh { "my_id": "3", "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } } PUT my-index-000001/_doc/4?routing=1&refresh { "my_id": "4", "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } }
父-join 和性能
编辑不应像关系数据库中的 join 那样使用 join 字段。在 Elasticsearch 中,获得良好性能的关键是将数据去规范化到文档中。每个 join 字段、has_child
或 has_parent
查询都会对查询性能增加显著的负担。它还可以触发构建全局序号。
只有在数据包含一对多关系,并且一个实体明显多于另一个实体时,join 字段才有意义。一个示例是产品和这些产品的报价用例。如果报价的数量明显多于产品数量,则将产品建模为父文档,将报价建模为子文档是有意义的。
父-join 限制
编辑使用父-join 进行搜索
编辑父-join 创建一个字段来索引文档中的关系名称 (my_parent
, my_child
, …)。
它还会为每个父/子关系创建一个字段。此字段的名称是 join
字段的名称,后跟 #
和关系中父项的名称。因此,例如,对于 my_parent
→ [my_child
, another_child
] 关系,join
字段会创建一个名为 my_join_field#my_parent
的附加字段。
如果文档是子文档 (my_child
或 another_child
),则此字段包含文档链接到的父 _id
,如果文档是父文档 (my_parent
),则包含文档的 _id
。
在搜索包含 join
字段的索引时,这两个字段始终在搜索响应中返回
resp = client.search( index="my-index-000001", query={ "match_all": {} }, sort=[ "my_id" ], ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { match_all: {}, }, sort: ["my_id"], }); console.log(response);
GET my-index-000001/_search { "query": { "match_all": {} }, "sort": ["my_id"] }
将返回
{ ..., "hits": { "total": { "value": 4, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "my-index-000001", "_id": "1", "_score": null, "_source": { "my_id": "1", "text": "This is a question", "my_join_field": "question" }, "sort": [ "1" ] }, { "_index": "my-index-000001", "_id": "2", "_score": null, "_source": { "my_id": "2", "text": "This is another question", "my_join_field": "question" }, "sort": [ "2" ] }, { "_index": "my-index-000001", "_id": "3", "_score": null, "_routing": "1", "_source": { "my_id": "3", "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } }, "sort": [ "3" ] }, { "_index": "my-index-000001", "_id": "4", "_score": null, "_routing": "1", "_source": { "my_id": "4", "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } }, "sort": [ "4" ] } ] } }
父-join 查询和聚合
编辑有关详细信息,请参阅 has_child
和 has_parent
查询、children
聚合,以及 内部命中。
join
字段的值可以在聚合和脚本中访问,并且可以使用 parent_id
查询进行查询
resp = client.search( index="my-index-000001", query={ "parent_id": { "type": "answer", "id": "1" } }, aggs={ "parents": { "terms": { "field": "my_join_field#question", "size": 10 } } }, runtime_mappings={ "parent": { "type": "long", "script": "\n emit(Integer.parseInt(doc['my_join_field#question'].value)) \n " } }, fields=[ { "field": "parent" } ], ) print(resp)
const response = await client.search({ index: "my-index-000001", query: { parent_id: { type: "answer", id: "1", }, }, aggs: { parents: { terms: { field: "my_join_field#question", size: 10, }, }, }, runtime_mappings: { parent: { type: "long", script: "\n emit(Integer.parseInt(doc['my_join_field#question'].value)) \n ", }, }, fields: [ { field: "parent", }, ], }); console.log(response);
GET my-index-000001/_search { "query": { "parent_id": { "type": "answer", "id": "1" } }, "aggs": { "parents": { "terms": { "field": "my_join_field#question", "size": 10 } } }, "runtime_mappings": { "parent": { "type": "long", "script": """ emit(Integer.parseInt(doc['my_join_field#question'].value)) """ } }, "fields": [ { "field": "parent" } ] }
查询 |
|
在 |
|
在脚本中访问 |
全局序号
编辑join
字段使用全局序号来加速 join。在对分片进行任何更改后,都需要重建全局序号。分片中存储的父 ID 值越多,重建 join
字段的全局序号所需的时间就越长。
默认情况下,全局序号是急切构建的:如果索引已更改,则将重建 join
字段的全局序号作为刷新的组成部分。这会大大增加刷新时间。但是,在大多数情况下,这是正确的权衡,否则在第一次使用父-join 查询或聚合时会重建全局序号。这可能会为您的用户引入显著的延迟高峰,并且通常会更糟,因为当发生大量写入时,可能会在单个刷新间隔内尝试重建 join
字段的多个全局序号。
当 join
字段不经常使用并且写入频繁发生时,禁用急切加载可能是有意义的
resp = client.indices.create( index="my-index-000001", mappings={ "properties": { "my_join_field": { "type": "join", "relations": { "question": "answer" }, "eager_global_ordinals": False } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { my_join_field: { type: 'join', relations: { question: 'answer' }, eager_global_ordinals: false } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", mappings: { properties: { my_join_field: { type: "join", relations: { question: "answer", }, eager_global_ordinals: false, }, }, }, }); console.log(response);
PUT my-index-000001 { "mappings": { "properties": { "my_join_field": { "type": "join", "relations": { "question": "answer" }, "eager_global_ordinals": false } } } }
可以通过以下方式检查每个父关系的全局序号使用的堆量
resp = client.indices.stats( metric="fielddata", human=True, fields="my_join_field", ) print(resp) resp1 = client.nodes.stats( metric="indices", index_metric="fielddata", human=True, fields="my_join_field", ) print(resp1)
response = client.indices.stats( metric: 'fielddata', human: true, fields: 'my_join_field' ) puts response response = client.nodes.stats( metric: 'indices', index_metric: 'fielddata', human: true, fields: 'my_join_field' ) puts response
const response = await client.indices.stats({ metric: "fielddata", human: "true", fields: "my_join_field", }); console.log(response); const response1 = await client.nodes.stats({ metric: "indices", index_metric: "fielddata", human: "true", fields: "my_join_field", }); console.log(response1);
# Per-index GET _stats/fielddata?human&fields=my_join_field#question # Per-node per-index GET _nodes/stats/indices/fielddata?human&fields=my_join_field#question
每个父项的多个子项
编辑还可以为单个父项定义多个子项
resp = client.indices.create( index="my-index-000001", mappings={ "properties": { "my_join_field": { "type": "join", "relations": { "question": [ "answer", "comment" ] } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { my_join_field: { type: 'join', relations: { question: [ 'answer', 'comment' ] } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", mappings: { properties: { my_join_field: { type: "join", relations: { question: ["answer", "comment"], }, }, }, }, }); console.log(response);
多级父 join
编辑我们不建议使用多层关系来复制关系模型。每层关系都会在查询时增加内存和计算方面的开销。为了获得更好的搜索性能,请对数据进行去规范化处理。
多级父/子
resp = client.indices.create( index="my-index-000001", mappings={ "properties": { "my_join_field": { "type": "join", "relations": { "question": [ "answer", "comment" ], "answer": "vote" } } } }, ) print(resp)
response = client.indices.create( index: 'my-index-000001', body: { mappings: { properties: { my_join_field: { type: 'join', relations: { question: [ 'answer', 'comment' ], answer: 'vote' } } } } } ) puts response
const response = await client.indices.create({ index: "my-index-000001", mappings: { properties: { my_join_field: { type: "join", relations: { question: ["answer", "comment"], answer: "vote", }, }, }, }, }); console.log(response);
PUT my-index-000001 { "mappings": { "properties": { "my_join_field": { "type": "join", "relations": { "question": ["answer", "comment"], "answer": "vote" } } } } }
上面的映射表示以下树
question / \ / \ comment answer | | vote
索引孙子文档需要一个等于祖父(谱系中更大的父项)的 routing
值
resp = client.index( index="my-index-000001", id="3", routing="1", refresh=True, document={ "text": "This is a vote", "my_join_field": { "name": "vote", "parent": "2" } }, ) print(resp)
response = client.index( index: 'my-index-000001', id: 3, routing: 1, refresh: true, body: { text: 'This is a vote', my_join_field: { name: 'vote', parent: '2' } } ) puts response
const response = await client.index({ index: "my-index-000001", id: 3, routing: 1, refresh: "true", document: { text: "This is a vote", my_join_field: { name: "vote", parent: "2", }, }, }); console.log(response);