Join 字段类型

编辑

join 数据类型是一种特殊字段,用于在同一索引的文档中创建父/子关系。relations 部分定义了文档中可能存在的一组关系,每个关系都是一个父名称和一个子名称。

我们不建议使用多层关系来复制关系模型。每层关系都会在查询时增加内存和计算方面的开销。为了获得更好的搜索性能,请对数据进行去规范化处理。

父/子关系可以定义如下:

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "my_id": {
                "type": "keyword"
            },
            "my_join_field": {
                "type": "join",
                "relations": {
                    "question": "answer"
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_id: {
          type: 'keyword'
        },
        my_join_field: {
          type: 'join',
          relations: {
            question: 'answer'
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      my_id: {
        type: "keyword",
      },
      my_join_field: {
        type: "join",
        relations: {
          question: "answer",
        },
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_id": {
        "type": "keyword"
      },
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}

字段的名称

定义一个单独的关系,其中 questionanswer 的父项。

要使用 join 索引文档,必须在 source 中提供关系名称和文档的可选父项。例如,以下示例在 question 上下文中创建两个 parent 文档

resp = client.index(
    index="my-index-000001",
    id="1",
    refresh=True,
    document={
        "my_id": "1",
        "text": "This is a question",
        "my_join_field": {
            "name": "question"
        }
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="2",
    refresh=True,
    document={
        "my_id": "2",
        "text": "This is another question",
        "my_join_field": {
            "name": "question"
        }
    },
)
print(resp1)
response = client.index(
  index: 'my-index-000001',
  id: 1,
  refresh: true,
  body: {
    my_id: '1',
    text: 'This is a question',
    my_join_field: {
      name: 'question'
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 2,
  refresh: true,
  body: {
    my_id: '2',
    text: 'This is another question',
    my_join_field: {
      name: 'question'
    }
  }
)
puts response
const response = await client.index({
  index: "my-index-000001",
  id: 1,
  refresh: "true",
  document: {
    my_id: "1",
    text: "This is a question",
    my_join_field: {
      name: "question",
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 2,
  refresh: "true",
  document: {
    my_id: "2",
    text: "This is another question",
    my_join_field: {
      name: "question",
    },
  },
});
console.log(response1);
PUT my-index-000001/_doc/1?refresh
{
  "my_id": "1",
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

PUT my-index-000001/_doc/2?refresh
{
  "my_id": "2",
  "text": "This is another question",
  "my_join_field": {
    "name": "question"
  }
}

此文档是一个 question 文档。

在索引父文档时,可以选择仅指定关系名称作为快捷方式,而不是将其封装在正常的对象表示法中

resp = client.index(
    index="my-index-000001",
    id="1",
    refresh=True,
    document={
        "my_id": "1",
        "text": "This is a question",
        "my_join_field": "question"
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="2",
    refresh=True,
    document={
        "my_id": "2",
        "text": "This is another question",
        "my_join_field": "question"
    },
)
print(resp1)
const response = await client.index({
  index: "my-index-000001",
  id: 1,
  refresh: "true",
  document: {
    my_id: "1",
    text: "This is a question",
    my_join_field: "question",
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 2,
  refresh: "true",
  document: {
    my_id: "2",
    text: "This is another question",
    my_join_field: "question",
  },
});
console.log(response1);
PUT my-index-000001/_doc/1?refresh
{
  "my_id": "1",
  "text": "This is a question",
  "my_join_field": "question" 
}

PUT my-index-000001/_doc/2?refresh
{
  "my_id": "2",
  "text": "This is another question",
  "my_join_field": "question"
}

父文档的更简单表示法仅使用关系名称。

索引子文档时,必须在 _source 中添加关系名称以及文档的父 ID。

需要将父项的谱系索引到同一个分片中,因此必须始终使用其更大的父 ID 来路由子文档。

例如,以下示例显示如何索引两个 child 文档

resp = client.index(
    index="my-index-000001",
    id="3",
    routing="1",
    refresh=True,
    document={
        "my_id": "3",
        "text": "This is an answer",
        "my_join_field": {
            "name": "answer",
            "parent": "1"
        }
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="4",
    routing="1",
    refresh=True,
    document={
        "my_id": "4",
        "text": "This is another answer",
        "my_join_field": {
            "name": "answer",
            "parent": "1"
        }
    },
)
print(resp1)
response = client.index(
  index: 'my-index-000001',
  id: 3,
  routing: 1,
  refresh: true,
  body: {
    my_id: '3',
    text: 'This is an answer',
    my_join_field: {
      name: 'answer',
      parent: '1'
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 4,
  routing: 1,
  refresh: true,
  body: {
    my_id: '4',
    text: 'This is another answer',
    my_join_field: {
      name: 'answer',
      parent: '1'
    }
  }
)
puts response
const response = await client.index({
  index: "my-index-000001",
  id: 3,
  routing: 1,
  refresh: "true",
  document: {
    my_id: "3",
    text: "This is an answer",
    my_join_field: {
      name: "answer",
      parent: "1",
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 4,
  routing: 1,
  refresh: "true",
  document: {
    my_id: "4",
    text: "This is another answer",
    my_join_field: {
      name: "answer",
      parent: "1",
    },
  },
});
console.log(response1);
PUT my-index-000001/_doc/3?routing=1&refresh 
{
  "my_id": "3",
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

PUT my-index-000001/_doc/4?routing=1&refresh
{
  "my_id": "4",
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "1"
  }
}

路由值是强制性的,因为父文档和子文档必须索引到同一个分片上

answer 是此文档的 join 名称

此子文档的父 ID

父-join 和性能

编辑

不应像关系数据库中的 join 那样使用 join 字段。在 Elasticsearch 中,获得良好性能的关键是将数据去规范化到文档中。每个 join 字段、has_childhas_parent 查询都会对查询性能增加显著的负担。它还可以触发构建全局序号

只有在数据包含一对多关系,并且一个实体明显多于另一个实体时,join 字段才有意义。一个示例是产品和这些产品的报价用例。如果报价的数量明显多于产品数量,则将产品建模为父文档,将报价建模为子文档是有意义的。

父-join 限制

编辑
  • 每个索引只允许一个 join 字段映射。
  • 父文档和子文档必须索引到同一个分片上。这意味着在获取删除更新子文档时,需要提供相同的 routing 值。
  • 一个元素可以有多个子元素,但只能有一个父元素。
  • 可以向现有的 join 字段添加新关系。
  • 还可以向现有元素添加子元素,但前提是该元素已经是父元素。

使用父-join 进行搜索

编辑

父-join 创建一个字段来索引文档中的关系名称 (my_parent, my_child, …​)。

它还会为每个父/子关系创建一个字段。此字段的名称是 join 字段的名称,后跟 # 和关系中父项的名称。因此,例如,对于 my_parent → [my_child, another_child] 关系,join 字段会创建一个名为 my_join_field#my_parent 的附加字段。

如果文档是子文档 (my_childanother_child),则此字段包含文档链接到的父 _id,如果文档是父文档 (my_parent),则包含文档的 _id

在搜索包含 join 字段的索引时,这两个字段始终在搜索响应中返回

resp = client.search(
    index="my-index-000001",
    query={
        "match_all": {}
    },
    sort=[
        "my_id"
    ],
)
print(resp)
const response = await client.search({
  index: "my-index-000001",
  query: {
    match_all: {},
  },
  sort: ["my_id"],
});
console.log(response);
GET my-index-000001/_search
{
  "query": {
    "match_all": {}
  },
  "sort": ["my_id"]
}

将返回

{
  ...,
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "my-index-000001",
        "_id": "1",
        "_score": null,
        "_source": {
          "my_id": "1",
          "text": "This is a question",
          "my_join_field": "question"         
        },
        "sort": [
          "1"
        ]
      },
      {
        "_index": "my-index-000001",
        "_id": "2",
        "_score": null,
        "_source": {
          "my_id": "2",
          "text": "This is another question",
          "my_join_field": "question"          
        },
        "sort": [
          "2"
        ]
      },
      {
        "_index": "my-index-000001",
        "_id": "3",
        "_score": null,
        "_routing": "1",
        "_source": {
          "my_id": "3",
          "text": "This is an answer",
          "my_join_field": {
            "name": "answer",                 
            "parent": "1"                     
          }
        },
        "sort": [
          "3"
        ]
      },
      {
        "_index": "my-index-000001",
        "_id": "4",
        "_score": null,
        "_routing": "1",
        "_source": {
          "my_id": "4",
          "text": "This is another answer",
          "my_join_field": {
            "name": "answer",
            "parent": "1"
          }
        },
        "sort": [
          "4"
        ]
      }
    ]
  }
}

此文档属于 question join

此文档属于 question join

此文档属于 answer join

子文档的链接父 ID

父-join 查询和聚合

编辑

有关详细信息,请参阅 has_childhas_parent 查询、children 聚合,以及 内部命中

join 字段的值可以在聚合和脚本中访问,并且可以使用 parent_id 查询进行查询

resp = client.search(
    index="my-index-000001",
    query={
        "parent_id": {
            "type": "answer",
            "id": "1"
        }
    },
    aggs={
        "parents": {
            "terms": {
                "field": "my_join_field#question",
                "size": 10
            }
        }
    },
    runtime_mappings={
        "parent": {
            "type": "long",
            "script": "\n        emit(Integer.parseInt(doc['my_join_field#question'].value)) \n      "
        }
    },
    fields=[
        {
            "field": "parent"
        }
    ],
)
print(resp)
const response = await client.search({
  index: "my-index-000001",
  query: {
    parent_id: {
      type: "answer",
      id: "1",
    },
  },
  aggs: {
    parents: {
      terms: {
        field: "my_join_field#question",
        size: 10,
      },
    },
  },
  runtime_mappings: {
    parent: {
      type: "long",
      script:
        "\n        emit(Integer.parseInt(doc['my_join_field#question'].value)) \n      ",
    },
  },
  fields: [
    {
      field: "parent",
    },
  ],
});
console.log(response);
GET my-index-000001/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "1"
    }
  },
  "aggs": {
    "parents": {
      "terms": {
        "field": "my_join_field#question", 
        "size": 10
      }
    }
  },
  "runtime_mappings": {
    "parent": {
      "type": "long",
      "script": """
        emit(Integer.parseInt(doc['my_join_field#question'].value)) 
      """
    }
  },
  "fields": [
    { "field": "parent" }
  ]
}

查询 parent id 字段(另请参阅 has_parent 查询has_child 查询

parent id 字段上进行聚合(另请参阅 children 聚合)

在脚本中访问 parent id 字段。

全局序号

编辑

join 字段使用全局序号来加速 join。在对分片进行任何更改后,都需要重建全局序号。分片中存储的父 ID 值越多,重建 join 字段的全局序号所需的时间就越长。

默认情况下,全局序号是急切构建的:如果索引已更改,则将重建 join 字段的全局序号作为刷新的组成部分。这会大大增加刷新时间。但是,在大多数情况下,这是正确的权衡,否则在第一次使用父-join 查询或聚合时会重建全局序号。这可能会为您的用户引入显著的延迟高峰,并且通常会更糟,因为当发生大量写入时,可能会在单个刷新间隔内尝试重建 join 字段的多个全局序号。

join 字段不经常使用并且写入频繁发生时,禁用急切加载可能是有意义的

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "my_join_field": {
                "type": "join",
                "relations": {
                    "question": "answer"
                },
                "eager_global_ordinals": False
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_join_field: {
          type: 'join',
          relations: {
            question: 'answer'
          },
          eager_global_ordinals: false
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      my_join_field: {
        type: "join",
        relations: {
          question: "answer",
        },
        eager_global_ordinals: false,
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
           "question": "answer"
        },
        "eager_global_ordinals": false
      }
    }
  }
}

可以通过以下方式检查每个父关系的全局序号使用的堆量

resp = client.indices.stats(
    metric="fielddata",
    human=True,
    fields="my_join_field",
)
print(resp)

resp1 = client.nodes.stats(
    metric="indices",
    index_metric="fielddata",
    human=True,
    fields="my_join_field",
)
print(resp1)
response = client.indices.stats(
  metric: 'fielddata',
  human: true,
  fields: 'my_join_field'
)
puts response

response = client.nodes.stats(
  metric: 'indices',
  index_metric: 'fielddata',
  human: true,
  fields: 'my_join_field'
)
puts response
const response = await client.indices.stats({
  metric: "fielddata",
  human: "true",
  fields: "my_join_field",
});
console.log(response);

const response1 = await client.nodes.stats({
  metric: "indices",
  index_metric: "fielddata",
  human: "true",
  fields: "my_join_field",
});
console.log(response1);
# Per-index
GET _stats/fielddata?human&fields=my_join_field#question

# Per-node per-index
GET _nodes/stats/indices/fielddata?human&fields=my_join_field#question

每个父项的多个子项

编辑

还可以为单个父项定义多个子项

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "my_join_field": {
                "type": "join",
                "relations": {
                    "question": [
                        "answer",
                        "comment"
                    ]
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_join_field: {
          type: 'join',
          relations: {
            question: [
              'answer',
              'comment'
            ]
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      my_join_field: {
        type: "join",
        relations: {
          question: ["answer", "comment"],
        },
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"]  
        }
      }
    }
  }
}

questionanswercomment 的父项。

多级父 join

编辑

我们不建议使用多层关系来复制关系模型。每层关系都会在查询时增加内存和计算方面的开销。为了获得更好的搜索性能,请对数据进行去规范化处理。

多级父/子

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "my_join_field": {
                "type": "join",
                "relations": {
                    "question": [
                        "answer",
                        "comment"
                    ],
                    "answer": "vote"
                }
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_join_field: {
          type: 'join',
          relations: {
            question: [
              'answer',
              'comment'
            ],
            answer: 'vote'
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      my_join_field: {
        type: "join",
        relations: {
          question: ["answer", "comment"],
          answer: "vote",
        },
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"],  
          "answer": "vote" 
        }
      }
    }
  }
}

questionanswercomment 的父项

answervote 的父项

上面的映射表示以下树

   question
    /    \
   /      \
comment  answer
           |
           |
          vote

索引孙子文档需要一个等于祖父(谱系中更大的父项)的 routing

resp = client.index(
    index="my-index-000001",
    id="3",
    routing="1",
    refresh=True,
    document={
        "text": "This is a vote",
        "my_join_field": {
            "name": "vote",
            "parent": "2"
        }
    },
)
print(resp)
response = client.index(
  index: 'my-index-000001',
  id: 3,
  routing: 1,
  refresh: true,
  body: {
    text: 'This is a vote',
    my_join_field: {
      name: 'vote',
      parent: '2'
    }
  }
)
puts response
const response = await client.index({
  index: "my-index-000001",
  id: 3,
  routing: 1,
  refresh: "true",
  document: {
    text: "This is a vote",
    my_join_field: {
      name: "vote",
      parent: "2",
    },
  },
});
console.log(response);
PUT my-index-000001/_doc/3?routing=1&refresh 
{
  "text": "This is a vote",
  "my_join_field": {
    "name": "vote",
    "parent": "2" 
  }
}

此子文档必须与其祖父和父项位于同一个分片上

此文档的父 ID(必须指向一个 answer 文档)