扁平字段类型

编辑

默认情况下,对象中的每个子字段都会被单独映射和索引。如果子字段的名称或类型事先未知,则会进行动态映射

flattened 类型提供了一种替代方法,其中整个对象被映射为单个字段。给定一个对象,flattened 映射将解析出其叶子值,并将它们作为关键字索引到一个字段中。然后可以通过简单的查询和聚合来搜索对象的内容。

这种数据类型对于索引具有大量或未知数量唯一键的对象非常有用。整个 JSON 对象只创建一个字段映射,这有助于防止由于有太多不同的字段映射而导致映射爆炸

另一方面,扁平对象字段在搜索功能方面存在权衡。仅允许基本查询,不支持数值范围查询或高亮显示。有关限制的更多信息,请参阅支持的操作部分。

flattened 映射类型不应用于索引所有文档内容,因为它将所有值视为关键字,并且不提供完整的搜索功能。默认方法(每个子字段在映射中都有自己的条目)在大多数情况下都适用。

可以按如下方式创建扁平对象字段

resp = client.indices.create(
    index="bug_reports",
    mappings={
        "properties": {
            "title": {
                "type": "text"
            },
            "labels": {
                "type": "flattened"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="bug_reports",
    id="1",
    document={
        "title": "Results are not sorted correctly.",
        "labels": {
            "priority": "urgent",
            "release": [
                "v1.2.5",
                "v1.3.0"
            ],
            "timestamp": {
                "created": 1541458026,
                "closed": 1541457010
            }
        }
    },
)
print(resp1)
response = client.indices.create(
  index: 'bug_reports',
  body: {
    mappings: {
      properties: {
        title: {
          type: 'text'
        },
        labels: {
          type: 'flattened'
        }
      }
    }
  }
)
puts response

response = client.index(
  index: 'bug_reports',
  id: 1,
  body: {
    title: 'Results are not sorted correctly.',
    labels: {
      priority: 'urgent',
      release: [
        'v1.2.5',
        'v1.3.0'
      ],
      timestamp: {
        created: 1_541_458_026,
        closed: 1_541_457_010
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "bug_reports",
  mappings: {
    properties: {
      title: {
        type: "text",
      },
      labels: {
        type: "flattened",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "bug_reports",
  id: 1,
  document: {
    title: "Results are not sorted correctly.",
    labels: {
      priority: "urgent",
      release: ["v1.2.5", "v1.3.0"],
      timestamp: {
        created: 1541458026,
        closed: 1541457010,
      },
    },
  },
});
console.log(response1);
PUT bug_reports
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "labels": {
        "type": "flattened"
      }
    }
  }
}

POST bug_reports/_doc/1
{
  "title": "Results are not sorted correctly.",
  "labels": {
    "priority": "urgent",
    "release": ["v1.2.5", "v1.3.0"],
    "timestamp": {
      "created": 1541458026,
      "closed": 1541457010
    }
  }
}

在索引期间,为 JSON 对象中的每个叶子值创建令牌。这些值将作为字符串关键字进行索引,而不会对数字或日期进行分析或特殊处理。

查询顶级的 flattened 字段会搜索对象中的所有叶子值

resp = client.search(
    index="bug_reports",
    query={
        "term": {
            "labels": "urgent"
        }
    },
)
print(resp)
response = client.search(
  index: 'bug_reports',
  body: {
    query: {
      term: {
        labels: 'urgent'
      }
    }
  }
)
puts response
const response = await client.search({
  index: "bug_reports",
  query: {
    term: {
      labels: "urgent",
    },
  },
});
console.log(response);
POST bug_reports/_search
{
  "query": {
    "term": {"labels": "urgent"}
  }
}

要在扁平对象中的特定键上查询,可以使用对象点表示法

resp = client.search(
    index="bug_reports",
    query={
        "term": {
            "labels.release": "v1.3.0"
        }
    },
)
print(resp)
response = client.search(
  index: 'bug_reports',
  body: {
    query: {
      term: {
        'labels.release' => 'v1.3.0'
      }
    }
  }
)
puts response
const response = await client.search({
  index: "bug_reports",
  query: {
    term: {
      "labels.release": "v1.3.0",
    },
  },
});
console.log(response);
POST bug_reports/_search
{
  "query": {
    "term": {"labels.release": "v1.3.0"}
  }
}

支持的操作

编辑

由于值索引方式的相似性,flattened 字段与keyword 字段共享许多相同的映射和搜索功能。

目前,扁平对象字段可以与以下查询类型一起使用

  • termtermsterms_set
  • prefix
  • range
  • matchmulti_match
  • query_stringsimple_query_string
  • exists

查询时,无法使用通配符引用字段键,如 { "term": {"labels.time*": 1541457010}}。 请注意,所有查询(包括 range)都将值视为字符串关键字。 flattened 字段不支持高亮显示。

可以在扁平对象字段上进行排序,以及执行简单的关键字样式聚合,例如 terms。与查询一样,没有对数字的特殊支持 — JSON 对象中的所有值都被视为关键字。排序时,这意味着值是按字典顺序比较的。

目前无法存储扁平对象字段。无法在映射中指定store 参数。

检索扁平字段

编辑

可以使用fields 参数检索字段值和具体子字段。内容。由于 flattened 字段将整个对象(可能具有许多子字段)映射为单个字段,因此响应包含来自 _source 的未更改结构。

但是,可以通过在请求中明确指定来获取单个子字段。这仅适用于具体路径,而不适用于通配符

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "flattened_field": {
                "type": "flattened"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="1",
    refresh=True,
    document={
        "flattened_field": {
            "subfield": "value"
        }
    },
)
print(resp1)

resp2 = client.search(
    index="my-index-000001",
    fields=[
        "flattened_field.subfield"
    ],
    source=False,
)
print(resp2)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        flattened_field: {
          type: 'flattened'
        }
      }
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 1,
  refresh: true,
  body: {
    flattened_field: {
      subfield: 'value'
    }
  }
)
puts response

response = client.search(
  index: 'my-index-000001',
  body: {
    fields: [
      'flattened_field.subfield'
    ],
    _source: false
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      flattened_field: {
        type: "flattened",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 1,
  refresh: "true",
  document: {
    flattened_field: {
      subfield: "value",
    },
  },
});
console.log(response1);

const response2 = await client.search({
  index: "my-index-000001",
  fields: ["flattened_field.subfield"],
  _source: false,
});
console.log(response2);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "flattened_field": {
        "type": "flattened"
      }
    }
  }
}

PUT my-index-000001/_doc/1?refresh=true
{
  "flattened_field" : {
    "subfield" : "value"
  }
}

POST my-index-000001/_search
{
  "fields": ["flattened_field.subfield"],
  "_source": false
}
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [{
      "_index": "my-index-000001",
      "_id": "1",
      "_score": 1.0,
      "fields": {
        "flattened_field.subfield" : [ "value" ]
      }
    }]
  }
}

您还可以使用 Painless 脚本来检索扁平字段的子字段中的值。在您的 Painless 脚本中,不要包含 doc['<field_name>'].value,而是使用 doc['<field_name>.<sub-field_name>'].value。例如,如果您有一个名为 label 的扁平字段,其中有一个 release 子字段,则您的 Painless 脚本将是 doc['labels.release'].value

例如,假设您的映射包含两个字段,其中一个字段的类型为 flattened

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "title": {
                "type": "text"
            },
            "labels": {
                "type": "flattened"
            }
        }
    },
)
print(resp)
response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        title: {
          type: 'text'
        },
        labels: {
          type: 'flattened'
        }
      }
    }
  }
)
puts response
const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      title: {
        type: "text",
      },
      labels: {
        type: "flattened",
      },
    },
  },
});
console.log(response);
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "labels": {
        "type": "flattened"
      }
    }
  }
}

索引一些包含已映射字段的文档。labels 字段有三个子字段

resp = client.bulk(
    index="my-index-000001",
    refresh=True,
    operations=[
        {
            "index": {}
        },
        {
            "title": "Something really urgent",
            "labels": {
                "priority": "urgent",
                "release": [
                    "v1.2.5",
                    "v1.3.0"
                ],
                "timestamp": {
                    "created": 1541458026,
                    "closed": 1541457010
                }
            }
        },
        {
            "index": {}
        },
        {
            "title": "Somewhat less urgent",
            "labels": {
                "priority": "high",
                "release": [
                    "v1.3.0"
                ],
                "timestamp": {
                    "created": 1541458026,
                    "closed": 1541457010
                }
            }
        },
        {
            "index": {}
        },
        {
            "title": "Not urgent",
            "labels": {
                "priority": "low",
                "release": [
                    "v1.2.0"
                ],
                "timestamp": {
                    "created": 1541458026,
                    "closed": 1541457010
                }
            }
        }
    ],
)
print(resp)
response = client.bulk(
  index: 'my-index-000001',
  refresh: true,
  body: [
    {
      index: {}
    },
    {
      title: 'Something really urgent',
      labels: {
        priority: 'urgent',
        release: [
          'v1.2.5',
          'v1.3.0'
        ],
        timestamp: {
          created: 1_541_458_026,
          closed: 1_541_457_010
        }
      }
    },
    {
      index: {}
    },
    {
      title: 'Somewhat less urgent',
      labels: {
        priority: 'high',
        release: [
          'v1.3.0'
        ],
        timestamp: {
          created: 1_541_458_026,
          closed: 1_541_457_010
        }
      }
    },
    {
      index: {}
    },
    {
      title: 'Not urgent',
      labels: {
        priority: 'low',
        release: [
          'v1.2.0'
        ],
        timestamp: {
          created: 1_541_458_026,
          closed: 1_541_457_010
        }
      }
    }
  ]
)
puts response
const response = await client.bulk({
  index: "my-index-000001",
  refresh: "true",
  operations: [
    {
      index: {},
    },
    {
      title: "Something really urgent",
      labels: {
        priority: "urgent",
        release: ["v1.2.5", "v1.3.0"],
        timestamp: {
          created: 1541458026,
          closed: 1541457010,
        },
      },
    },
    {
      index: {},
    },
    {
      title: "Somewhat less urgent",
      labels: {
        priority: "high",
        release: ["v1.3.0"],
        timestamp: {
          created: 1541458026,
          closed: 1541457010,
        },
      },
    },
    {
      index: {},
    },
    {
      title: "Not urgent",
      labels: {
        priority: "low",
        release: ["v1.2.0"],
        timestamp: {
          created: 1541458026,
          closed: 1541457010,
        },
      },
    },
  ],
});
console.log(response);
POST /my-index-000001/_bulk?refresh
{"index":{}}
{"title":"Something really urgent","labels":{"priority":"urgent","release":["v1.2.5","v1.3.0"],"timestamp":{"created":1541458026,"closed":1541457010}}}
{"index":{}}
{"title":"Somewhat less urgent","labels":{"priority":"high","release":["v1.3.0"],"timestamp":{"created":1541458026,"closed":1541457010}}}
{"index":{}}
{"title":"Not urgent","labels":{"priority":"low","release":["v1.2.0"],"timestamp":{"created":1541458026,"closed":1541457010}}}

因为 labelsflattened 字段类型,所以整个对象被映射为单个字段。要在 Painless 脚本中从此子字段检索值,请使用 doc['<field_name>.<sub-field_name>'].value 格式。

"script": {
  "source": """
    if (doc['labels.release'].value.equals('v1.3.0'))
    {emit(doc['labels.release'].value)}
    else{emit('Version mismatch')}
  """

扁平对象字段的参数

编辑

接受以下映射参数

depth_limit

扁平对象字段允许的最大深度,以嵌套内部对象表示。如果扁平对象字段超过此限制,则会抛出错误。默认为 20。请注意,可以通过更新映射 API 动态更新 depth_limit

doc_values

是否应以列式方式将字段存储在磁盘上,以便稍后将其用于排序、聚合或脚本?接受 true(默认)或 false

eager_global_ordinals

是否应在刷新时立即加载全局序数?接受 truefalse(默认)。在频繁用于术语聚合的字段上启用此功能是一个好主意。

ignore_above

超过此限制的叶子值将不会被索引。默认情况下,没有限制,所有值都将被索引。请注意,此限制适用于扁平对象字段内的叶子值,而不适用于整个字段的长度。

index

确定字段是否应该可搜索。接受 true(默认)或 false

index_options

出于评分目的,应在索引中存储哪些信息。默认为 docs,但也可以设置为 freqs,以便在计算分数时考虑词频。

null_value

一个字符串值,它将替换扁平对象字段中的任何显式 null 值。默认为 null,这意味着将空字段视为缺失字段。

similarity

应该使用哪种评分算法或相似度。默认为 BM25

split_queries_on_whitespace

在为此字段构建查询时,全文查询是否应在空格上拆分输入。接受 truefalse(默认)。

time_series_dimensions

(可选,字符串数组)扁平对象内部的字段列表,其中每个字段都是时间序列的维度。每个字段都使用从根字段开始的相对路径指定,并且不包括根字段名称。

合成 _source

编辑

合成 _source 仅适用于 TSDB 索引(将 index.mode 设置为 time_series 的索引)。对于其他索引,合成 _source 处于技术预览状态。技术预览中的功能可能会在未来的版本中更改或删除。Elastic 将努力解决任何问题,但技术预览中的功能不受官方 GA 功能的支持 SLA 约束。

扁平字段在其默认配置中支持合成`_source`

合成源可能会对 flattened 字段值进行排序并删除重复项。例如

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "flattened": {
                "type": "flattened"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "flattened": {
            "field": [
                "apple",
                "apple",
                "banana",
                "avocado",
                "10",
                "200",
                "AVOCADO",
                "Banana",
                "Tangerine"
            ]
        }
    },
)
print(resp1)
const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      flattened: {
        type: "flattened",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    flattened: {
      field: [
        "apple",
        "apple",
        "banana",
        "avocado",
        "10",
        "200",
        "AVOCADO",
        "Banana",
        "Tangerine",
      ],
    },
  },
});
console.log(response1);
PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "flattened": { "type": "flattened" }
    }
  }
}
PUT idx/_doc/1
{
  "flattened": {
    "field": [ "apple", "apple", "banana", "avocado", "10", "200", "AVOCADO", "Banana", "Tangerine" ]
  }
}

将变为

{
  "flattened": {
    "field": [ "10", "200", "AVOCADO", "Banana", "Tangerine", "apple", "avocado", "banana" ]
  }
}

合成源始终使用嵌套对象而不是对象数组。例如

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "flattened": {
                "type": "flattened"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "flattened": {
            "field": [
                {
                    "id": 1,
                    "name": "foo"
                },
                {
                    "id": 2,
                    "name": "bar"
                },
                {
                    "id": 3,
                    "name": "baz"
                }
            ]
        }
    },
)
print(resp1)
const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      flattened: {
        type: "flattened",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    flattened: {
      field: [
        {
          id: 1,
          name: "foo",
        },
        {
          id: 2,
          name: "bar",
        },
        {
          id: 3,
          name: "baz",
        },
      ],
    },
  },
});
console.log(response1);
PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "flattened": { "type": "flattened" }
    }
  }
}
PUT idx/_doc/1
{
  "flattened": {
      "field": [
        { "id": 1, "name": "foo" },
        { "id": 2, "name": "bar" },
        { "id": 3, "name": "baz" }
      ]
  }
}

将变为(注意嵌套对象而不是“扁平”数组)

{
    "flattened": {
      "field": {
          "id": [ "1", "2", "3" ],
          "name": [ "bar", "baz", "foo" ]
      }
    }
}

合成源始终对单元素数组使用单值字段。例如

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "flattened": {
                "type": "flattened"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "flattened": {
            "field": [
                "foo"
            ]
        }
    },
)
print(resp1)
const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      flattened: {
        type: "flattened",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    flattened: {
      field: ["foo"],
    },
  },
});
console.log(response1);
PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "flattened": { "type": "flattened" }
    }
  }
}
PUT idx/_doc/1
{
  "flattened": {
    "field": [ "foo" ]
  }
}

将变为(注意嵌套对象而不是“扁平”数组)

{
  "flattened": {
    "field": "foo"
  }
}