› › ›

关键词类型族

编辑

关键词类型族

编辑

关键词族包括以下字段类型：

keyword，用于结构化内容，例如 ID、电子邮件地址、主机名、状态代码、邮政编码或标签。
constant_keyword，用于始终包含相同值的关键词字段。
wildcard，用于非结构化的机器生成内容。wildcard 类型针对具有较大值或高基数的字段进行了优化。

关键词字段通常用于排序、聚合和词项级别查询，例如term。

避免将关键词字段用于全文搜索。请改用 text 字段类型。

关键词字段类型

编辑

以下是基本 keyword 字段的映射示例：

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "tags": {
                "type": "keyword"
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        tags: {
          type: 'keyword'
        }
      }
    }
  }
)
puts response

res, err := es.Indices.Create(
	"my-index-000001",
	es.Indices.Create.WithBody(strings.NewReader(`{
	  "mappings": {
	    "properties": {
	      "tags": {
	        "type": "keyword"
	      }
	    }
	  }
	}`)),
)
fmt.Println(res, err)

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      tags: {
        type: "keyword",
      },
    },
  },
});
console.log(response);

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "tags": {
        "type":  "keyword"
      }
    }
  }
}

映射数字标识符

并非所有数字数据都应映射为数字字段数据类型。Elasticsearch 针对 range 查询优化了数字字段，例如 integer 或 long。但是，对于 term 和其他词项级别查询，keyword 字段更好。

标识符（例如 ISBN 或产品 ID）很少在 range 查询中使用。但是，它们通常使用词项级别查询来检索。

如果满足以下条件，请考虑将数字标识符映射为 keyword：

您不打算使用 range 查询来搜索标识符数据。
快速检索很重要。keyword 字段上的 term 查询搜索通常比数字字段上的 term 搜索更快。

如果您不确定使用哪种，可以使用多字段将数据映射为 keyword *和*数字数据类型。

基本关键词字段的参数

编辑

keyword 字段接受以下参数：

doc_values

是否应以列式方式将字段存储在磁盘上，以便稍后用于排序、聚合或脚本？接受 true（默认）或 false。

eager_global_ordinals

是否应在刷新时急切加载全局序号？接受 true 或 false（默认）。在频繁用于词项聚合的字段上启用此功能是一个好主意。

fields

多字段允许以多种方式索引相同的字符串值以用于不同的目的，例如，一个字段用于搜索，一个多字段用于排序和聚合。

ignore_above

不索引任何长于此值的字符串。默认为 2147483647，以便接受所有值。但是请注意，默认的动态映射规则会创建一个子 keyword 字段，该字段通过设置 ignore_above: 256 来覆盖此默认值。

index

该字段是否应可快速搜索？接受 true（默认）和 false。仅启用 doc_values 的 keyword 字段仍然可以查询，尽管速度较慢。

index_options

为了评分目的，应该在索引中存储哪些信息。默认为 docs，但也可以设置为 freqs，以便在计算分数时考虑词项频率。

meta

有关该字段的元数据。

norms

在对查询评分时是否应考虑字段长度。接受 true 或 false（默认）。

null_value

接受一个字符串值，该值将替换任何显式的 null 值。默认为 null，这意味着该字段被视为缺失。请注意，如果使用了 script 值，则无法设置此值。

on_script_error

定义如果 script 参数定义的脚本在索引时引发错误该怎么做。接受 fail（默认），这将导致整个文档被拒绝，以及 continue，这将文档的 _ignored 元数据字段中注册该字段并继续索引。仅当还设置了 script 字段时，才能设置此参数。

script

如果设置了此参数，则该字段将索引由此脚本生成的值，而不是直接从源读取值。如果在输入文档中为此字段设置了值，则该文档将被拒绝并显示错误。脚本的格式与其运行时等效项的格式相同。脚本发出的值将像往常一样进行规范化，并且如果它们比 ignore_above 设置的值长，则将被忽略。

store

字段值是否应存储并可与_source 字段分开检索。接受 true 或 false（默认）。

similarity

应使用哪种评分算法或相似度。默认为 BM25。

normalizer

如何在索引之前预处理关键词。默认为 null，表示关键词保持原样。

split_queries_on_whitespace

当为此字段构建查询时，全文查询是否应在空格上拆分输入。接受 true 或 false（默认）。

time_series_dimension

（可选，布尔值）

将字段标记为时间序列维度。默认为 false。

index.mapping.dimension_fields.limit 索引设置限制了索引中维度的数量。

维度字段具有以下约束：

doc_values 和 index 映射参数必须为 true。
维度值用于标识文档的时间序列。如果在索引期间以任何方式更改维度值，则文档将存储为属于与预期不同的时间序列。因此，还有其他约束：
- 该字段不能使用 normalizer。

合成 `_source`

编辑

合成 _source 仅对 TSDB 索引（将 index.mode 设置为 time_series 的索引）正式可用。对于其他索引，合成 _source 处于技术预览中。技术预览中的功能可能会在将来的版本中更改或删除。Elastic 将努力解决任何问题，但技术预览中的功能不受官方 GA 功能的支持 SLA 的约束。

合成源可以对 keyword 字段进行排序并删除重复项。例如：

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "kwd": {
                "type": "keyword"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "kwd": [
            "foo",
            "foo",
            "bar",
            "baz"
        ]
    },
)
print(resp1)

const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      kwd: {
        type: "keyword",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    kwd: ["foo", "foo", "bar", "baz"],
  },
});
console.log(response1);

PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "kwd": { "type": "keyword" }
    }
  }
}
PUT idx/_doc/1
{
  "kwd": ["foo", "foo", "bar", "baz"]
}

将变为：

{
  "kwd": ["bar", "baz", "foo"]
}

如果 keyword 字段将 store 设置为 true，则保留顺序和重复项。例如：

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "kwd": {
                "type": "keyword",
                "store": True
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "kwd": [
            "foo",
            "foo",
            "bar",
            "baz"
        ]
    },
)
print(resp1)

const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      kwd: {
        type: "keyword",
        store: true,
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    kwd: ["foo", "foo", "bar", "baz"],
  },
});
console.log(response1);

PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "kwd": { "type": "keyword", "store": true }
    }
  }
}
PUT idx/_doc/1
{
  "kwd": ["foo", "foo", "bar", "baz"]
}

将变为：

{
  "kwd": ["foo", "foo", "bar", "baz"]
}

长度超过 ignore_above 的值将保留但排序到最后。例如：

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "kwd": {
                "type": "keyword",
                "ignore_above": 3
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "kwd": [
            "foo",
            "foo",
            "bang",
            "bar",
            "baz"
        ]
    },
)
print(resp1)

const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      kwd: {
        type: "keyword",
        ignore_above: 3,
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    kwd: ["foo", "foo", "bang", "bar", "baz"],
  },
});
console.log(response1);

PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "kwd": { "type": "keyword", "ignore_above": 3 }
    }
  }
}
PUT idx/_doc/1
{
  "kwd": ["foo", "foo", "bang", "bar", "baz"]
}

将变为：

{
  "kwd": ["bar", "baz", "foo", "bang"]
}

常量关键词字段类型

编辑

常量关键词是 keyword 字段的一种特殊化，用于索引中的所有文档都具有相同值的情况。

resp = client.indices.create(
    index="logs-debug",
    mappings={
        "properties": {
            "@timestamp": {
                "type": "date"
            },
            "message": {
                "type": "text"
            },
            "level": {
                "type": "constant_keyword",
                "value": "debug"
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'logs-debug',
  body: {
    mappings: {
      properties: {
        "@timestamp": {
          type: 'date'
        },
        message: {
          type: 'text'
        },
        level: {
          type: 'constant_keyword',
          value: 'debug'
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "logs-debug",
  mappings: {
    properties: {
      "@timestamp": {
        type: "date",
      },
      message: {
        type: "text",
      },
      level: {
        type: "constant_keyword",
        value: "debug",
      },
    },
  },
});
console.log(response);

PUT logs-debug
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "message": {
        "type": "text"
      },
      "level": {
        "type": "constant_keyword",
        "value": "debug"
      }
    }
  }
}

constant_keyword 支持与 keyword 字段相同的查询和聚合，但它利用了每个索引中所有文档都具有相同值的事实来更高效地执行查询。

允许提交没有该字段的值或具有与映射中配置的值相等的值的文档。以下两个索引请求是等效的：

resp = client.index(
    index="logs-debug",
    document={
        "date": "2019-12-12",
        "message": "Starting up Elasticsearch",
        "level": "debug"
    },
)
print(resp)

resp1 = client.index(
    index="logs-debug",
    document={
        "date": "2019-12-12",
        "message": "Starting up Elasticsearch"
    },
)
print(resp1)

response = client.index(
  index: 'logs-debug',
  body: {
    date: '2019-12-12',
    message: 'Starting up Elasticsearch',
    level: 'debug'
  }
)
puts response

response = client.index(
  index: 'logs-debug',
  body: {
    date: '2019-12-12',
    message: 'Starting up Elasticsearch'
  }
)
puts response

const response = await client.index({
  index: "logs-debug",
  document: {
    date: "2019-12-12",
    message: "Starting up Elasticsearch",
    level: "debug",
  },
});
console.log(response);

const response1 = await client.index({
  index: "logs-debug",
  document: {
    date: "2019-12-12",
    message: "Starting up Elasticsearch",
  },
});
console.log(response1);

POST logs-debug/_doc
{
  "date": "2019-12-12",
  "message": "Starting up Elasticsearch",
  "level": "debug"
}

POST logs-debug/_doc
{
  "date": "2019-12-12",
  "message": "Starting up Elasticsearch"
}

但是，不允许提供与映射中配置的值不同的值。

如果在映射中未提供 value，则该字段将根据第一个索引文档中包含的值自动配置自身。尽管此行为可能很方便，但请注意，这意味着如果单个有问题的文档具有错误的值，则可能会导致所有其他文档被拒绝。

在提供值之前（通过映射或来自文档），对该字段的查询将不匹配任何文档。这包括 exists 查询。

字段的 value 一旦设置后就无法更改。

常量关键字字段的参数

编辑

接受以下映射参数

`meta`	有关该字段的元数据。
`value`	与索引中所有文档关联的值。如果未提供此参数，则会根据第一个被索引的文档进行设置。

通配符字段类型

编辑

wildcard 字段类型是一种专门的关键字字段，用于非结构化机器生成的内容，您计划使用类似 grep 的 wildcard 和 regexp 查询进行搜索。wildcard 类型针对具有较大值或高基数的字段进行了优化。

映射非结构化内容

您可以将包含非结构化内容的字段映射到 text 或关键字系列字段。最佳字段类型取决于内容的性质以及您计划如何搜索该字段。

如果符合以下情况，请使用 text 字段类型

内容是人类可读的，例如电子邮件正文或产品描述。
您计划使用全文查询在字段中搜索单个单词或短语，例如 the brown fox jumped。Elasticsearch 会分析 text 字段，以返回与这些查询最相关的结果。

如果符合以下情况，请使用关键字系列字段类型

内容是机器生成的，例如日志消息或 HTTP 请求信息。
您计划使用词项级查询在字段中搜索精确的完整值（例如 org.foo.bar）或部分字符序列（例如 org.foo.*）。

选择关键字系列字段类型

如果您选择关键字系列字段类型，则可以根据字段值的基数和大小将该字段映射为 keyword 或 wildcard 字段。如果您计划使用 wildcard 或 regexp 查询定期搜索该字段，并且满足以下条件之一，请使用 wildcard 类型

该字段包含超过一百万个唯一值。
并且
您计划使用带有前导通配符的模式定期搜索该字段，例如 *foo 或 *baz。
该字段包含大于 32KB 的值。
并且
您计划使用任何通配符模式定期搜索该字段。

否则，请使用 keyword 字段类型以实现更快的搜索速度、更快的索引速度和更低的存储成本。有关深入比较和决策流程图，请参阅我们的相关博客文章。

从 text 字段切换到关键字字段

如果您以前使用 text 字段来索引非结构化机器生成的内容，您可以重新索引以更新映射为 keyword 或 wildcard 字段。我们还建议您更新应用程序或工作流程，以将字段上任何基于单词的全文查询替换为等效的词项级查询。

在内部，wildcard 字段使用 n 元语法索引整个字段值，并存储完整的字符串。该索引用作粗略的过滤器，以减少通过检索和检查完整值而检查的值的数量。此字段特别适合在日志行上运行类似 grep 的查询。存储成本通常低于 keyword 字段，但对完整词项进行精确匹配的搜索速度较慢。如果字段值共享许多前缀，例如同一网站的 URL，则 wildcard 字段的存储成本可能高于等效的 keyword 字段。

您可以按如下方式索引和搜索通配符字段

resp = client.indices.create(
    index="my-index-000001",
    mappings={
        "properties": {
            "my_wildcard": {
                "type": "wildcard"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="my-index-000001",
    id="1",
    document={
        "my_wildcard": "This string can be quite lengthy"
    },
)
print(resp1)

resp2 = client.search(
    index="my-index-000001",
    query={
        "wildcard": {
            "my_wildcard": {
                "value": "*quite*lengthy"
            }
        }
    },
)
print(resp2)

response = client.indices.create(
  index: 'my-index-000001',
  body: {
    mappings: {
      properties: {
        my_wildcard: {
          type: 'wildcard'
        }
      }
    }
  }
)
puts response

response = client.index(
  index: 'my-index-000001',
  id: 1,
  body: {
    my_wildcard: 'This string can be quite lengthy'
  }
)
puts response

response = client.search(
  index: 'my-index-000001',
  body: {
    query: {
      wildcard: {
        my_wildcard: {
          value: '*quite*lengthy'
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "my-index-000001",
  mappings: {
    properties: {
      my_wildcard: {
        type: "wildcard",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "my-index-000001",
  id: 1,
  document: {
    my_wildcard: "This string can be quite lengthy",
  },
});
console.log(response1);

const response2 = await client.search({
  index: "my-index-000001",
  query: {
    wildcard: {
      my_wildcard: {
        value: "*quite*lengthy",
      },
    },
  },
});
console.log(response2);

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_wildcard": {
        "type": "wildcard"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "my_wildcard" : "This string can be quite lengthy"
}

GET my-index-000001/_search
{
  "query": {
    "wildcard": {
      "my_wildcard": {
        "value": "*quite*lengthy"
      }
    }
  }
}

通配符字段的参数

编辑

wildcard 字段接受以下参数

`null_value`	接受一个字符串值，该值将替换任何显式的 `null` 值。默认为 `null`，这意味着该字段被视为缺失。
`ignore_above`	不索引任何长于此值的字符串。默认为 `2147483647`，以便接受所有值。

限制

编辑

wildcard 字段像关键字字段一样是未分词的，因此不支持依赖于单词位置的查询，例如短语查询。
运行 wildcard 查询时，任何 rewrite 参数都将被忽略。评分始终是常数分数。

合成 `_source`

编辑

合成源可以对 wildcard 字段值进行排序。例如

resp = client.indices.create(
    index="idx",
    settings={
        "index": {
            "mapping": {
                "source": {
                    "mode": "synthetic"
                }
            }
        }
    },
    mappings={
        "properties": {
            "card": {
                "type": "wildcard"
            }
        }
    },
)
print(resp)

resp1 = client.index(
    index="idx",
    id="1",
    document={
        "card": [
            "king",
            "ace",
            "ace",
            "jack"
        ]
    },
)
print(resp1)

const response = await client.indices.create({
  index: "idx",
  settings: {
    index: {
      mapping: {
        source: {
          mode: "synthetic",
        },
      },
    },
  },
  mappings: {
    properties: {
      card: {
        type: "wildcard",
      },
    },
  },
});
console.log(response);

const response1 = await client.index({
  index: "idx",
  id: 1,
  document: {
    card: ["king", "ace", "ace", "jack"],
  },
});
console.log(response1);

PUT idx
{
  "settings": {
    "index": {
      "mapping": {
        "source": {
          "mode": "synthetic"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "card": { "type": "wildcard" }
    }
  }
}
PUT idx/_doc/1
{
  "card": ["king", "ace", "ace", "jack"]
}

将变为：

{
  "card": ["ace", "jack", "king"]
}

« 连接字段类型嵌套字段类型 »

关键词类型族

关键词类型族

关键词字段类型

基本关键词字段的参数

合成 _source

常量关键词字段类型

常量关键字字段的参数

通配符字段类型

通配符字段的参数

限制

合成 _source

合成 `_source`

合成 `_source`