查找文本结构 API

编辑

查找文本的结构。文本必须包含适合导入 Elastic Stack 的数据。

请求

编辑

POST _text_structure/find_structure

前提条件

编辑
  • 如果启用了 Elasticsearch 安全功能,则必须具有 monitor_text_structuremonitor 集群权限才能使用此 API。请参阅 安全权限

描述

编辑

此 API 为将数据以适合随后与其他 Elastic Stack 功能一起使用的格式导入 Elasticsearch 提供了一个起点。

与其他 Elasticsearch 端点不同,发布到此端点的数��不需要使用 UTF-8 编码并采用 JSON 格式。但是,它必须是文本;目前不支持二进制文本格式。

API 的响应包含:

  • 文本开头的一些消息。
  • 揭示在文本中检测到的所有字段的最常见值以及数字字段的基本数字统计信息的统计数据。
  • 有关文本结构的信息,这在您编写摄取配置以索引它或类似格式的文本时非常有用。
  • Elasticsearch 索引的适当映射,您可以使用它来摄取文本。

结构查找器无需任何指导即可计算所有这些信息。但是,您可以选择通过指定一个或多个查询参数来覆盖有关文本结构的一些决策。

输出的详细信息可以在 示例 中查看。

如果结构查找器对某些文本产生意外的结果,请指定 explain 查询参数。这会导致响应中出现 explanation,这应该有助于确定选择返回结构的原因。

查询参数

编辑
charset
(可选,字符串) 文本的字符集。它必须是 Elasticsearch 使用的 JVM 支持的字符集。例如,UTF-8UTF-16LEwindows-1252EUC-JP。如果未指定此参数,结构查找器将选择合适的字符集。
column_names
(可选,字符串) 如果您已将 format 设置为 delimited,则可以以逗号分隔列表的形式指定列名。如果未指定此参数,结构查找器将使用文本标题行中的列名。如果文本没有标题行,则列名为“column1”、“column2”、“column3”等。
delimiter
(可选,字符串) 如果您已将 format 设置为 delimited,则可以指定用于分隔每一行中值的字符。仅支持单个字符;分隔符不能包含多个字符。默认情况下,API 会考虑以下可能性:逗号、制表符、分号和管道 (|)。在此默认情况下,所有行的字段数必须相同才能检测到分隔格式。如果指定了分隔符,最多 10% 的行可以与第一行具有不同的列数。
explain
(可选,布尔值) 如果为 true,则响应包含名为 explanation 的字段,该字段是一个字符串数组,指示结构查找器如何生成其结果。默认值为 false
format
(可选,字符串) 文本的高级结构。有效值为 ndjsonxmldelimitedsemi_structured_text。默认情况下,API 会选择格式。在此默认情况下,所有行的字段数必须相同才能检测到分隔格式。但是,如果 format 设置为 delimited 且未设置 delimiter,则 API 容忍最多 5% 的行与第一行具有不同的列数。
grok_pattern
(可选,字符串) 如果您已将 format 设置为 semi_structured_text,则可以指定一个 Grok 模式,用于从文本中的每条消息中提取字段。Grok 模式中时间戳字段的名称必须与 timestamp_field 参数中指定的名称匹配。如果没有指定该参数,则 Grok 模式中时间戳字段的名称必须与“timestamp”匹配。如果未指定 grok_pattern,则结构查找器将创建一个 Grok 模式。
ecs_compatibility
(可选,字符串) 与符合 ECS 的 Grok 模式的兼容模式。当结构查找器创建 Grok 模式时,使用此参数指定是使用 ECS Grok 模式还是使用旧模式。有效值为 disabledv1。默认值为 disabled。此设置主要在整个消息 Grok 模式(例如 %{CATALINALOG})匹配输入时产生影响。如果结构查找器识别出公共结构但不知道含义,则将在 grok_pattern 输出中使用通用字段名称(例如 pathipaddressfield1field2),目的是让了解含义的用户在使用前重命名这些字段。
has_header_row
(可选,布尔值) 如果您已将 format 设置为 delimited,则可以使用此参数指示列名是否在文本的第一行。如果未指定此参数,结构查找器将根据文本第一行与其他行的相似性进行猜测。
line_merge_size_limit
(可选,无符号整数) 分析半结构化文本时合并行以形成消息时消息中的最大字符数。默认为 10000。如果您有极长的消息,您可能需要增加此值,但请注意,如果将行分组到消息的方式检测错误,这可能会导致非常长的处理时间。
lines_to_sample

(可选,无符号整数) 从文本开头开始包含在结构分析中的行数。最小值为 2;默认为 1000。如果此参数的值大于文本中的行数,则分析将继续进行(只要文本中至少有两行),以处理所有行。

行数和行的变化会影响分析速度。例如,如果您上传的文本前 1000 行都是同一消息的不同变体,则分析将找到比使用更大样本时更多的共性。但是,如果可能的话,上传前 1000 行变化更大的样本文本比请求分析 100000 行以实现某种变化效率更高。

quote
(可选,字符串) 如果您已将 format 设置为 delimited,则可以指定用于引用每一行中的值(如果它们包含换行符或分隔符字符)的字符。仅支持单个字符。如果未指定此参数,则默认值为双引号 (")。如果您的分隔文本格式不使用引用,则一种解决方法是将此参数设置为样本中任何地方都不出现的字符。
should_trim_fields
(可选,布尔值) 如果您已将 format 设置为 delimited,则可以指定分隔符之间的值是否应去除空格。如果未指定此参数且分隔符为管道 (|),则默认值为 true。否则,默认值为 false
timeout
(可选,时间单位) 设置结构分析可能花费的最大时间量。如果分析在超时到期时仍在运行,则它将被停止。默认值为 25 秒。
timestamp_field

(可选,字符串) 包含文本中每条记录的主要时间戳的字段的名称。特别是,如果文本被摄取到索引中,则此字段将用于填充 @timestamp 字段。

如果 formatsemi_structured_text,则此字段必须与 grok_pattern 中相应提取的名称匹配。因此,对于半结构化文本,除非也指定了 grok_pattern,否则最好不要指定此参数。

对于结构化文本,如果您指定此参数,则该字段必须存在于文本中。

如果未指定此参数,则结构查找器将决定哪个字段(如果有)是主要时间戳字段。对于结构化文本,文本中不必包含时间戳。

timestamp_format

(可选,字符串) 文本中时间戳字段的 Java 时间格式。

仅支持 Java 时间格式字母组的子集

  • a
  • d
  • dd
  • EEE
  • EEEE
  • H
  • HH
  • h
  • M
  • MM
  • MMM
  • MMMM
  • mm
  • ss
  • XX
  • XXX
  • yy
  • yyyy
  • zzz

此外,还支持长度为 1 到 9 的 S 字母组(小数秒),前提是它们出现在 ss 之后,并由 .,:ss 分隔。还允许使用空格和标点符号,但 ?、换行符和回车符除外,以及用单引号括起来的文字文本。例如,MM/dd HH.mm.ss,SSSSSS 'in' yyyy 是有效的重写格式。

此参数的一个有价值的用例是,当格式为半结构化文本时,文本中有多个时间戳格式,并且您知道哪个格式对应于主要时间戳,但您不想指定完整的 grok_pattern。另一个是当时间戳格式是结构查找器默认情况下不考虑的格式时。

如果未指定此参数,结构查找器将从内置集中选择最佳格式。

如果指定了特殊值 null,则结构查找器将不会在文本中查找主要时间戳。当格式为半结构化文本时,这将导致结构查找器将文本视为单行消息。

下表提供了一些示例时间戳的相应 timeformat

时间格式 表示

yyyy-MM-dd HH:mm:ssZ

2019-04-20 13:15:22+0000

EEE, d MMM yyyy HH:mm:ss Z

Sat, 20 Apr 2019 13:15:22 +0000

dd.MM.yy HH:mm:ss.SSS

20.04.19 13:15:22.285

有关日期和时间格式语法的更多信息,请参阅 Java 日期/时间格式文档

请求体

编辑

您想要分析的文本。它必须包含适合导入 Elasticsearch 的数据。它不需要是 JSON 格式,也不需要是 UTF-8 编码。大小限制为 Elasticsearch HTTP 接收缓冲区大小,默认为 100 MB。

示例

编辑
导入换行符分隔的 JSON
编辑

假设您有包含有关一些书籍信息的换行符分隔的 JSON 文本。您可以将内容发送到 find_structure 端点

resp = client.text_structure.find_structure(
    text_files=[
        {
            "name": "Leviathan Wakes",
            "author": "James S.A. Corey",
            "release_date": "2011-06-02",
            "page_count": 561
        },
        {
            "name": "Hyperion",
            "author": "Dan Simmons",
            "release_date": "1989-05-26",
            "page_count": 482
        },
        {
            "name": "Dune",
            "author": "Frank Herbert",
            "release_date": "1965-06-01",
            "page_count": 604
        },
        {
            "name": "Dune Messiah",
            "author": "Frank Herbert",
            "release_date": "1969-10-15",
            "page_count": 331
        },
        {
            "name": "Children of Dune",
            "author": "Frank Herbert",
            "release_date": "1976-04-21",
            "page_count": 408
        },
        {
            "name": "God Emperor of Dune",
            "author": "Frank Herbert",
            "release_date": "1981-05-28",
            "page_count": 454
        },
        {
            "name": "Consider Phlebas",
            "author": "Iain M. Banks",
            "release_date": "1987-04-23",
            "page_count": 471
        },
        {
            "name": "Pandora's Star",
            "author": "Peter F. Hamilton",
            "release_date": "2004-03-02",
            "page_count": 768
        },
        {
            "name": "Revelation Space",
            "author": "Alastair Reynolds",
            "release_date": "2000-03-15",
            "page_count": 585
        },
        {
            "name": "A Fire Upon the Deep",
            "author": "Vernor Vinge",
            "release_date": "1992-06-01",
            "page_count": 613
        },
        {
            "name": "Ender's Game",
            "author": "Orson Scott Card",
            "release_date": "1985-06-01",
            "page_count": 324
        },
        {
            "name": "1984",
            "author": "George Orwell",
            "release_date": "1985-06-01",
            "page_count": 328
        },
        {
            "name": "Fahrenheit 451",
            "author": "Ray Bradbury",
            "release_date": "1953-10-15",
            "page_count": 227
        },
        {
            "name": "Brave New World",
            "author": "Aldous Huxley",
            "release_date": "1932-06-01",
            "page_count": 268
        },
        {
            "name": "Foundation",
            "author": "Isaac Asimov",
            "release_date": "1951-06-01",
            "page_count": 224
        },
        {
            "name": "The Giver",
            "author": "Lois Lowry",
            "release_date": "1993-04-26",
            "page_count": 208
        },
        {
            "name": "Slaughterhouse-Five",
            "author": "Kurt Vonnegut",
            "release_date": "1969-06-01",
            "page_count": 275
        },
        {
            "name": "The Hitchhiker's Guide to the Galaxy",
            "author": "Douglas Adams",
            "release_date": "1979-10-12",
            "page_count": 180
        },
        {
            "name": "Snow Crash",
            "author": "Neal Stephenson",
            "release_date": "1992-06-01",
            "page_count": 470
        },
        {
            "name": "Neuromancer",
            "author": "William Gibson",
            "release_date": "1984-07-01",
            "page_count": 271
        },
        {
            "name": "The Handmaid's Tale",
            "author": "Margaret Atwood",
            "release_date": "1985-06-01",
            "page_count": 311
        },
        {
            "name": "Starship Troopers",
            "author": "Robert A. Heinlein",
            "release_date": "1959-12-01",
            "page_count": 335
        },
        {
            "name": "The Left Hand of Darkness",
            "author": "Ursula K. Le Guin",
            "release_date": "1969-06-01",
            "page_count": 304
        },
        {
            "name": "The Moon is a Harsh Mistress",
            "author": "Robert A. Heinlein",
            "release_date": "1966-04-01",
            "page_count": 288
        }
    ],
)
print(resp)
response = client.text_structure.find_structure(
  body: [
    {
      name: 'Leviathan Wakes',
      author: 'James S.A. Corey',
      release_date: '2011-06-02',
      page_count: 561
    },
    {
      name: 'Hyperion',
      author: 'Dan Simmons',
      release_date: '1989-05-26',
      page_count: 482
    },
    {
      name: 'Dune',
      author: 'Frank Herbert',
      release_date: '1965-06-01',
      page_count: 604
    },
    {
      name: 'Dune Messiah',
      author: 'Frank Herbert',
      release_date: '1969-10-15',
      page_count: 331
    },
    {
      name: 'Children of Dune',
      author: 'Frank Herbert',
      release_date: '1976-04-21',
      page_count: 408
    },
    {
      name: 'God Emperor of Dune',
      author: 'Frank Herbert',
      release_date: '1981-05-28',
      page_count: 454
    },
    {
      name: 'Consider Phlebas',
      author: 'Iain M. Banks',
      release_date: '1987-04-23',
      page_count: 471
    },
    {
      name: "Pandora's Star",
      author: 'Peter F. Hamilton',
      release_date: '2004-03-02',
      page_count: 768
    },
    {
      name: 'Revelation Space',
      author: 'Alastair Reynolds',
      release_date: '2000-03-15',
      page_count: 585
    },
    {
      name: 'A Fire Upon the Deep',
      author: 'Vernor Vinge',
      release_date: '1992-06-01',
      page_count: 613
    },
    {
      name: "Ender's Game",
      author: 'Orson Scott Card',
      release_date: '1985-06-01',
      page_count: 324
    },
    {
      name: '1984',
      author: 'George Orwell',
      release_date: '1985-06-01',
      page_count: 328
    },
    {
      name: 'Fahrenheit 451',
      author: 'Ray Bradbury',
      release_date: '1953-10-15',
      page_count: 227
    },
    {
      name: 'Brave New World',
      author: 'Aldous Huxley',
      release_date: '1932-06-01',
      page_count: 268
    },
    {
      name: 'Foundation',
      author: 'Isaac Asimov',
      release_date: '1951-06-01',
      page_count: 224
    },
    {
      name: 'The Giver',
      author: 'Lois Lowry',
      release_date: '1993-04-26',
      page_count: 208
    },
    {
      name: 'Slaughterhouse-Five',
      author: 'Kurt Vonnegut',
      release_date: '1969-06-01',
      page_count: 275
    },
    {
      name: "The Hitchhiker's Guide to the Galaxy",
      author: 'Douglas Adams',
      release_date: '1979-10-12',
      page_count: 180
    },
    {
      name: 'Snow Crash',
      author: 'Neal Stephenson',
      release_date: '1992-06-01',
      page_count: 470
    },
    {
      name: 'Neuromancer',
      author: 'William Gibson',
      release_date: '1984-07-01',
      page_count: 271
    },
    {
      name: "The Handmaid's Tale",
      author: 'Margaret Atwood',
      release_date: '1985-06-01',
      page_count: 311
    },
    {
      name: 'Starship Troopers',
      author: 'Robert A. Heinlein',
      release_date: '1959-12-01',
      page_count: 335
    },
    {
      name: 'The Left Hand of Darkness',
      author: 'Ursula K. Le Guin',
      release_date: '1969-06-01',
      page_count: 304
    },
    {
      name: 'The Moon is a Harsh Mistress',
      author: 'Robert A. Heinlein',
      release_date: '1966-04-01',
      page_count: 288
    }
  ]
)
puts response
const response = await client.textStructure.findStructure({
  text_files: [
    {
      name: "Leviathan Wakes",
      author: "James S.A. Corey",
      release_date: "2011-06-02",
      page_count: 561,
    },
    {
      name: "Hyperion",
      author: "Dan Simmons",
      release_date: "1989-05-26",
      page_count: 482,
    },
    {
      name: "Dune",
      author: "Frank Herbert",
      release_date: "1965-06-01",
      page_count: 604,
    },
    {
      name: "Dune Messiah",
      author: "Frank Herbert",
      release_date: "1969-10-15",
      page_count: 331,
    },
    {
      name: "Children of Dune",
      author: "Frank Herbert",
      release_date: "1976-04-21",
      page_count: 408,
    },
    {
      name: "God Emperor of Dune",
      author: "Frank Herbert",
      release_date: "1981-05-28",
      page_count: 454,
    },
    {
      name: "Consider Phlebas",
      author: "Iain M. Banks",
      release_date: "1987-04-23",
      page_count: 471,
    },
    {
      name: "Pandora's Star",
      author: "Peter F. Hamilton",
      release_date: "2004-03-02",
      page_count: 768,
    },
    {
      name: "Revelation Space",
      author: "Alastair Reynolds",
      release_date: "2000-03-15",
      page_count: 585,
    },
    {
      name: "A Fire Upon the Deep",
      author: "Vernor Vinge",
      release_date: "1992-06-01",
      page_count: 613,
    },
    {
      name: "Ender's Game",
      author: "Orson Scott Card",
      release_date: "1985-06-01",
      page_count: 324,
    },
    {
      name: "1984",
      author: "George Orwell",
      release_date: "1985-06-01",
      page_count: 328,
    },
    {
      name: "Fahrenheit 451",
      author: "Ray Bradbury",
      release_date: "1953-10-15",
      page_count: 227,
    },
    {
      name: "Brave New World",
      author: "Aldous Huxley",
      release_date: "1932-06-01",
      page_count: 268,
    },
    {
      name: "Foundation",
      author: "Isaac Asimov",
      release_date: "1951-06-01",
      page_count: 224,
    },
    {
      name: "The Giver",
      author: "Lois Lowry",
      release_date: "1993-04-26",
      page_count: 208,
    },
    {
      name: "Slaughterhouse-Five",
      author: "Kurt Vonnegut",
      release_date: "1969-06-01",
      page_count: 275,
    },
    {
      name: "The Hitchhiker's Guide to the Galaxy",
      author: "Douglas Adams",
      release_date: "1979-10-12",
      page_count: 180,
    },
    {
      name: "Snow Crash",
      author: "Neal Stephenson",
      release_date: "1992-06-01",
      page_count: 470,
    },
    {
      name: "Neuromancer",
      author: "William Gibson",
      release_date: "1984-07-01",
      page_count: 271,
    },
    {
      name: "The Handmaid's Tale",
      author: "Margaret Atwood",
      release_date: "1985-06-01",
      page_count: 311,
    },
    {
      name: "Starship Troopers",
      author: "Robert A. Heinlein",
      release_date: "1959-12-01",
      page_count: 335,
    },
    {
      name: "The Left Hand of Darkness",
      author: "Ursula K. Le Guin",
      release_date: "1969-06-01",
      page_count: 304,
    },
    {
      name: "The Moon is a Harsh Mistress",
      author: "Robert A. Heinlein",
      release_date: "1966-04-01",
      page_count: 288,
    },
  ],
});
console.log(response);
POST _text_structure/find_structure
{"name": "Leviathan Wakes", "author": "James S.A. Corey", "release_date": "2011-06-02", "page_count": 561}
{"name": "Hyperion", "author": "Dan Simmons", "release_date": "1989-05-26", "page_count": 482}
{"name": "Dune", "author": "Frank Herbert", "release_date": "1965-06-01", "page_count": 604}
{"name": "Dune Messiah", "author": "Frank Herbert", "release_date": "1969-10-15", "page_count": 331}
{"name": "Children of Dune", "author": "Frank Herbert", "release_date": "1976-04-21", "page_count": 408}
{"name": "God Emperor of Dune", "author": "Frank Herbert", "release_date": "1981-05-28", "page_count": 454}
{"name": "Consider Phlebas", "author": "Iain M. Banks", "release_date": "1987-04-23", "page_count": 471}
{"name": "Pandora's Star", "author": "Peter F. Hamilton", "release_date": "2004-03-02", "page_count": 768}
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585}
{"name": "A Fire Upon the Deep", "author": "Vernor Vinge", "release_date": "1992-06-01", "page_count": 613}
{"name": "Ender's Game", "author": "Orson Scott Card", "release_date": "1985-06-01", "page_count": 324}
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{"name": "Foundation", "author": "Isaac Asimov", "release_date": "1951-06-01", "page_count": 224}
{"name": "The Giver", "author": "Lois Lowry", "release_date": "1993-04-26", "page_count": 208}
{"name": "Slaughterhouse-Five", "author": "Kurt Vonnegut", "release_date": "1969-06-01", "page_count": 275}
{"name": "The Hitchhiker's Guide to the Galaxy", "author": "Douglas Adams", "release_date": "1979-10-12", "page_count": 180}
{"name": "Snow Crash", "author": "Neal Stephenson", "release_date": "1992-06-01", "page_count": 470}
{"name": "Neuromancer", "author": "William Gibson", "release_date": "1984-07-01", "page_count": 271}
{"name": "The Handmaid's Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}
{"name": "Starship Troopers", "author": "Robert A. Heinlein", "release_date": "1959-12-01", "page_count": 335}
{"name": "The Left Hand of Darkness", "author": "Ursula K. Le Guin", "release_date": "1969-06-01", "page_count": 304}
{"name": "The Moon is a Harsh Mistress", "author": "Robert A. Heinlein", "release_date": "1966-04-01", "page_count": 288}

如果请求未遇到错误,您将收到以下结果

{
  "num_lines_analyzed" : 24, 
  "num_messages_analyzed" : 24, 
  "sample_start" : "{\"name\": \"Leviathan Wakes\", \"author\": \"James S.A. Corey\", \"release_date\": \"2011-06-02\", \"page_count\": 561}\n{\"name\": \"Hyperion\", \"author\": \"Dan Simmons\", \"release_date\": \"1989-05-26\", \"page_count\": 482}\n", 
  "charset" : "UTF-8", 
  "has_byte_order_marker" : false, 
  "format" : "ndjson", 
  "ecs_compatibility" : "disabled", 
  "timestamp_field" : "release_date", 
  "joda_timestamp_formats" : [ 
    "ISO8601"
  ],
  "java_timestamp_formats" : [ 
    "ISO8601"
  ],
  "need_client_timezone" : true, 
  "mappings" : { 
    "properties" : {
      "@timestamp" : {
        "type" : "date"
      },
      "author" : {
        "type" : "keyword"
      },
      "name" : {
        "type" : "keyword"
      },
      "page_count" : {
        "type" : "long"
      },
      "release_date" : {
        "type" : "date",
        "format" : "iso8601"
      }
    }
  },
  "ingest_pipeline" : {
    "description" : "Ingest pipeline created by text structure finder",
    "processors" : [
      {
        "date" : {
          "field" : "release_date",
          "timezone" : "{{ event.timezone }}",
          "formats" : [
            "ISO8601"
          ]
        }
      }
    ]
  },
  "field_stats" : { 
    "author" : {
      "count" : 24,
      "cardinality" : 20,
      "top_hits" : [
        {
          "value" : "Frank Herbert",
          "count" : 4
        },
        {
          "value" : "Robert A. Heinlein",
          "count" : 2
        },
        {
          "value" : "Alastair Reynolds",
          "count" : 1
        },
        {
          "value" : "Aldous Huxley",
          "count" : 1
        },
        {
          "value" : "Dan Simmons",
          "count" : 1
        },
        {
          "value" : "Douglas Adams",
          "count" : 1
        },
        {
          "value" : "George Orwell",
          "count" : 1
        },
        {
          "value" : "Iain M. Banks",
          "count" : 1
        },
        {
          "value" : "Isaac Asimov",
          "count" : 1
        },
        {
          "value" : "James S.A. Corey",
          "count" : 1
        }
      ]
    },
    "name" : {
      "count" : 24,
      "cardinality" : 24,
      "top_hits" : [
        {
          "value" : "1984",
          "count" : 1
        },
        {
          "value" : "A Fire Upon the Deep",
          "count" : 1
        },
        {
          "value" : "Brave New World",
          "count" : 1
        },
        {
          "value" : "Children of Dune",
          "count" : 1
        },
        {
          "value" : "Consider Phlebas",
          "count" : 1
        },
        {
          "value" : "Dune",
          "count" : 1
        },
        {
          "value" : "Dune Messiah",
          "count" : 1
        },
        {
          "value" : "Ender's Game",
          "count" : 1
        },
        {
          "value" : "Fahrenheit 451",
          "count" : 1
        },
        {
          "value" : "Foundation",
          "count" : 1
        }
      ]
    },
    "page_count" : {
      "count" : 24,
      "cardinality" : 24,
      "min_value" : 180,
      "max_value" : 768,
      "mean_value" : 387.0833333333333,
      "median_value" : 329.5,
      "top_hits" : [
        {
          "value" : 180,
          "count" : 1
        },
        {
          "value" : 208,
          "count" : 1
        },
        {
          "value" : 224,
          "count" : 1
        },
        {
          "value" : 227,
          "count" : 1
        },
        {
          "value" : 268,
          "count" : 1
        },
        {
          "value" : 271,
          "count" : 1
        },
        {
          "value" : 275,
          "count" : 1
        },
        {
          "value" : 288,
          "count" : 1
        },
        {
          "value" : 304,
          "count" : 1
        },
        {
          "value" : 311,
          "count" : 1
        }
      ]
    },
    "release_date" : {
      "count" : 24,
      "cardinality" : 20,
      "earliest" : "1932-06-01",
      "latest" : "2011-06-02",
      "top_hits" : [
        {
          "value" : "1985-06-01",
          "count" : 3
        },
        {
          "value" : "1969-06-01",
          "count" : 2
        },
        {
          "value" : "1992-06-01",
          "count" : 2
        },
        {
          "value" : "1932-06-01",
          "count" : 1
        },
        {
          "value" : "1951-06-01",
          "count" : 1
        },
        {
          "value" : "1953-10-15",
          "count" : 1
        },
        {
          "value" : "1959-12-01",
          "count" : 1
        },
        {
          "value" : "1965-06-01",
          "count" : 1
        },
        {
          "value" : "1966-04-01",
          "count" : 1
        },
        {
          "value" : "1969-10-15",
          "count" : 1
        }
      ]
    }
  }
}

num_lines_analyzed 指示分析了多少行文本。

num_messages_analyzed 指示这些行包含多少个不同的消息。对于 NDJSON,此值与 num_lines_analyzed 相同。对于其他文本格式,消息可以跨越多行。

sample_start 原样重现文本中的前两条消息。这可能有助于诊断解析错误或意外上传错误的文本。

charset 指示用于解析文本的字符编码。

对于 UTF 字符编码,has_byte_order_marker 指示文本是否以字节顺序标记开头。

formatndjsonxmldelimitedsemi_structured_text 之一。

ecs_compatibilitydisabledv1,默认为 disabled

timestamp_field 指定最有可能作为每个文档主要时间戳的字段。

joda_timestamp_formats 用于告诉 Logstash 如何解析时间戳。

java_timestamp_formats 是时间字段中识别的 Java 时间格式。Elasticsearch 映射和摄取管道使用此格式。

如果检测到不包含时区的时间戳格式,则 need_client_timezone 将为 true。因此,必须由客户端告知解析文本的服务器正确的时区。

mappings 包含一些适合索引的映射,数据可以被导入到其中。在这种情况下,release_date 字段已被赋予 keyword 类型,因为它不被认为足够具体到转换为 date 类型。

field_stats 包含每个字段的最常见值,以及数值 page_count 字段的基本数值统计信息。此信息可能会提供线索,表明在其他 Elastic Stack 功能使用数据之前需要对其进行清理或转换。

查找纽约市黄包车示例数据的结构
编辑

下一个示例显示了如何查找一些纽约市黄包车行程数据的结构。第一个 curl 命令下载数据,然后将前 20000 行通过管道传输到 find_structure 端点。端点的 lines_to_sample 查询参数设置为 20000,以匹配 head 命令中指定的数量。

curl -s "s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2018-06.csv" | head -20000 | curl -s -H "Content-Type: application/json" -XPOST "localhost:9200/_text_structure/find_structure?pretty&lines_to_sample=20000" -T -

即使在这种情况下数据不是 JSON,也必须设置 Content-Type: application/json 标头。(或者,可以将 Content-Type 设置为 Elasticsearch 支持的任何其他类型,但必须设置。)

如果请求未遇到错误,您将收到以下结果

{
  "num_lines_analyzed" : 20000,
  "num_messages_analyzed" : 19998, 
  "sample_start" : "VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount\n\n1,2018-06-01 00:15:40,2018-06-01 00:16:46,1,.00,1,N,145,145,2,3,0.5,0.5,0,0,0.3,4.3\n",
  "charset" : "UTF-8",
  "has_byte_order_marker" : false,
  "format" : "delimited", 
  "multiline_start_pattern" : "^.*?,\"?\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}",
  "exclude_lines_pattern" : "^\"?VendorID\"?,\"?tpep_pickup_datetime\"?,\"?tpep_dropoff_datetime\"?,\"?passenger_count\"?,\"?trip_distance\"?,\"?RatecodeID\"?,\"?store_and_fwd_flag\"?,\"?PULocationID\"?,\"?DOLocationID\"?,\"?payment_type\"?,\"?fare_amount\"?,\"?extra\"?,\"?mta_tax\"?,\"?tip_amount\"?,\"?tolls_amount\"?,\"?improvement_surcharge\"?,\"?total_amount\"?",
  "column_names" : [ 
    "VendorID",
    "tpep_pickup_datetime",
    "tpep_dropoff_datetime",
    "passenger_count",
    "trip_distance",
    "RatecodeID",
    "store_and_fwd_flag",
    "PULocationID",
    "DOLocationID",
    "payment_type",
    "fare_amount",
    "extra",
    "mta_tax",
    "tip_amount",
    "tolls_amount",
    "improvement_surcharge",
    "total_amount"
  ],
  "has_header_row" : true, 
  "delimiter" : ",", 
  "quote" : "\"", 
  "timestamp_field" : "tpep_pickup_datetime", 
  "joda_timestamp_formats" : [ 
    "YYYY-MM-dd HH:mm:ss"
  ],
  "java_timestamp_formats" : [ 
    "yyyy-MM-dd HH:mm:ss"
  ],
  "need_client_timezone" : true, 
  "mappings" : {
    "properties" : {
      "@timestamp" : {
        "type" : "date"
      },
      "DOLocationID" : {
        "type" : "long"
      },
      "PULocationID" : {
        "type" : "long"
      },
      "RatecodeID" : {
        "type" : "long"
      },
      "VendorID" : {
        "type" : "long"
      },
      "extra" : {
        "type" : "double"
      },
      "fare_amount" : {
        "type" : "double"
      },
      "improvement_surcharge" : {
        "type" : "double"
      },
      "mta_tax" : {
        "type" : "double"
      },
      "passenger_count" : {
        "type" : "long"
      },
      "payment_type" : {
        "type" : "long"
      },
      "store_and_fwd_flag" : {
        "type" : "keyword"
      },
      "tip_amount" : {
        "type" : "double"
      },
      "tolls_amount" : {
        "type" : "double"
      },
      "total_amount" : {
        "type" : "double"
      },
      "tpep_dropoff_datetime" : {
        "type" : "date",
        "format" : "yyyy-MM-dd HH:mm:ss"
      },
      "tpep_pickup_datetime" : {
        "type" : "date",
        "format" : "yyyy-MM-dd HH:mm:ss"
      },
      "trip_distance" : {
        "type" : "double"
      }
    }
  },
  "ingest_pipeline" : {
    "description" : "Ingest pipeline created by text structure finder",
    "processors" : [
      {
        "csv" : {
          "field" : "message",
          "target_fields" : [
            "VendorID",
            "tpep_pickup_datetime",
            "tpep_dropoff_datetime",
            "passenger_count",
            "trip_distance",
            "RatecodeID",
            "store_and_fwd_flag",
            "PULocationID",
            "DOLocationID",
            "payment_type",
            "fare_amount",
            "extra",
            "mta_tax",
            "tip_amount",
            "tolls_amount",
            "improvement_surcharge",
            "total_amount"
          ]
        }
      },
      {
        "date" : {
          "field" : "tpep_pickup_datetime",
          "timezone" : "{{ event.timezone }}",
          "formats" : [
            "yyyy-MM-dd HH:mm:ss"
          ]
        }
      },
      {
        "convert" : {
          "field" : "DOLocationID",
          "type" : "long"
        }
      },
      {
        "convert" : {
          "field" : "PULocationID",
          "type" : "long"
        }
      },
      {
        "convert" : {
          "field" : "RatecodeID",
          "type" : "long"
        }
      },
      {
        "convert" : {
          "field" : "VendorID",
          "type" : "long"
        }
      },
      {
        "convert" : {
          "field" : "extra",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "fare_amount",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "improvement_surcharge",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "mta_tax",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "passenger_count",
          "type" : "long"
        }
      },
      {
        "convert" : {
          "field" : "payment_type",
          "type" : "long"
        }
      },
      {
        "convert" : {
          "field" : "tip_amount",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "tolls_amount",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "total_amount",
          "type" : "double"
        }
      },
      {
        "convert" : {
          "field" : "trip_distance",
          "type" : "double"
        }
      },
      {
        "remove" : {
          "field" : "message"
        }
      }
    ]
  },
  "field_stats" : {
    "DOLocationID" : {
      "count" : 19998,
      "cardinality" : 240,
      "min_value" : 1,
      "max_value" : 265,
      "mean_value" : 150.26532653265312,
      "median_value" : 148,
      "top_hits" : [
        {
          "value" : 79,
          "count" : 760
        },
        {
          "value" : 48,
          "count" : 683
        },
        {
          "value" : 68,
          "count" : 529
        },
        {
          "value" : 170,
          "count" : 506
        },
        {
          "value" : 107,
          "count" : 468
        },
        {
          "value" : 249,
          "count" : 457
        },
        {
          "value" : 230,
          "count" : 441
        },
        {
          "value" : 186,
          "count" : 432
        },
        {
          "value" : 141,
          "count" : 409
        },
        {
          "value" : 263,
          "count" : 386
        }
      ]
    },
    "PULocationID" : {
      "count" : 19998,
      "cardinality" : 154,
      "min_value" : 1,
      "max_value" : 265,
      "mean_value" : 153.4042404240424,
      "median_value" : 148,
      "top_hits" : [
        {
          "value" : 79,
          "count" : 1067
        },
        {
          "value" : 230,
          "count" : 949
        },
        {
          "value" : 148,
          "count" : 940
        },
        {
          "value" : 132,
          "count" : 897
        },
        {
          "value" : 48,
          "count" : 853
        },
        {
          "value" : 161,
          "count" : 820
        },
        {
          "value" : 234,
          "count" : 750
        },
        {
          "value" : 249,
          "count" : 722
        },
        {
          "value" : 164,
          "count" : 663
        },
        {
          "value" : 114,
          "count" : 646
        }
      ]
    },
    "RatecodeID" : {
      "count" : 19998,
      "cardinality" : 5,
      "min_value" : 1,
      "max_value" : 5,
      "mean_value" : 1.0656565656565653,
      "median_value" : 1,
      "top_hits" : [
        {
          "value" : 1,
          "count" : 19311
        },
        {
          "value" : 2,
          "count" : 468
        },
        {
          "value" : 5,
          "count" : 195
        },
        {
          "value" : 4,
          "count" : 17
        },
        {
          "value" : 3,
          "count" : 7
        }
      ]
    },
    "VendorID" : {
      "count" : 19998,
      "cardinality" : 2,
      "min_value" : 1,
      "max_value" : 2,
      "mean_value" : 1.59005900590059,
      "median_value" : 2,
      "top_hits" : [
        {
          "value" : 2,
          "count" : 11800
        },
        {
          "value" : 1,
          "count" : 8198
        }
      ]
    },
    "extra" : {
      "count" : 19998,
      "cardinality" : 3,
      "min_value" : -0.5,
      "max_value" : 0.5,
      "mean_value" : 0.4815981598159816,
      "median_value" : 0.5,
      "top_hits" : [
        {
          "value" : 0.5,
          "count" : 19281
        },
        {
          "value" : 0,
          "count" : 698
        },
        {
          "value" : -0.5,
          "count" : 19
        }
      ]
    },
    "fare_amount" : {
      "count" : 19998,
      "cardinality" : 208,
      "min_value" : -100,
      "max_value" : 300,
      "mean_value" : 13.937719771977209,
      "median_value" : 9.5,
      "top_hits" : [
        {
          "value" : 6,
          "count" : 1004
        },
        {
          "value" : 6.5,
          "count" : 935
        },
        {
          "value" : 5.5,
          "count" : 909
        },
        {
          "value" : 7,
          "count" : 903
        },
        {
          "value" : 5,
          "count" : 889
        },
        {
          "value" : 7.5,
          "count" : 854
        },
        {
          "value" : 4.5,
          "count" : 802
        },
        {
          "value" : 8.5,
          "count" : 790
        },
        {
          "value" : 8,
          "count" : 789
        },
        {
          "value" : 9,
          "count" : 711
        }
      ]
    },
    "improvement_surcharge" : {
      "count" : 19998,
      "cardinality" : 3,
      "min_value" : -0.3,
      "max_value" : 0.3,
      "mean_value" : 0.29915991599159913,
      "median_value" : 0.3,
      "top_hits" : [
        {
          "value" : 0.3,
          "count" : 19964
        },
        {
          "value" : -0.3,
          "count" : 22
        },
        {
          "value" : 0,
          "count" : 12
        }
      ]
    },
    "mta_tax" : {
      "count" : 19998,
      "cardinality" : 3,
      "min_value" : -0.5,
      "max_value" : 0.5,
      "mean_value" : 0.4962246224622462,
      "median_value" : 0.5,
      "top_hits" : [
        {
          "value" : 0.5,
          "count" : 19868
        },
        {
          "value" : 0,
          "count" : 109
        },
        {
          "value" : -0.5,
          "count" : 21
        }
      ]
    },
    "passenger_count" : {
      "count" : 19998,
      "cardinality" : 7,
      "min_value" : 0,
      "max_value" : 6,
      "mean_value" : 1.6201620162016201,
      "median_value" : 1,
      "top_hits" : [
        {
          "value" : 1,
          "count" : 14219
        },
        {
          "value" : 2,
          "count" : 2886
        },
        {
          "value" : 5,
          "count" : 1047
        },
        {
          "value" : 3,
          "count" : 804
        },
        {
          "value" : 6,
          "count" : 523
        },
        {
          "value" : 4,
          "count" : 406
        },
        {
          "value" : 0,
          "count" : 113
        }
      ]
    },
    "payment_type" : {
      "count" : 19998,
      "cardinality" : 4,
      "min_value" : 1,
      "max_value" : 4,
      "mean_value" : 1.315631563156316,
      "median_value" : 1,
      "top_hits" : [
        {
          "value" : 1,
          "count" : 13936
        },
        {
          "value" : 2,
          "count" : 5857
        },
        {
          "value" : 3,
          "count" : 160
        },
        {
          "value" : 4,
          "count" : 45
        }
      ]
    },
    "store_and_fwd_flag" : {
      "count" : 19998,
      "cardinality" : 2,
      "top_hits" : [
        {
          "value" : "N",
          "count" : 19910
        },
        {
          "value" : "Y",
          "count" : 88
        }
      ]
    },
    "tip_amount" : {
      "count" : 19998,
      "cardinality" : 717,
      "min_value" : 0,
      "max_value" : 128,
      "mean_value" : 2.010959095909593,
      "median_value" : 1.45,
      "top_hits" : [
        {
          "value" : 0,
          "count" : 6917
        },
        {
          "value" : 1,
          "count" : 1178
        },
        {
          "value" : 2,
          "count" : 624
        },
        {
          "value" : 3,
          "count" : 248
        },
        {
          "value" : 1.56,
          "count" : 206
        },
        {
          "value" : 1.46,
          "count" : 205
        },
        {
          "value" : 1.76,
          "count" : 196
        },
        {
          "value" : 1.45,
          "count" : 195
        },
        {
          "value" : 1.36,
          "count" : 191
        },
        {
          "value" : 1.5,
          "count" : 187
        }
      ]
    },
    "tolls_amount" : {
      "count" : 19998,
      "cardinality" : 26,
      "min_value" : 0,
      "max_value" : 35,
      "mean_value" : 0.2729697969796978,
      "median_value" : 0,
      "top_hits" : [
        {
          "value" : 0,
          "count" : 19107
        },
        {
          "value" : 5.76,
          "count" : 791
        },
        {
          "value" : 10.5,
          "count" : 36
        },
        {
          "value" : 2.64,
          "count" : 21
        },
        {
          "value" : 11.52,
          "count" : 8
        },
        {
          "value" : 5.54,
          "count" : 4
        },
        {
          "value" : 8.5,
          "count" : 4
        },
        {
          "value" : 17.28,
          "count" : 4
        },
        {
          "value" : 2,
          "count" : 2
        },
        {
          "value" : 2.16,
          "count" : 2
        }
      ]
    },
    "total_amount" : {
      "count" : 19998,
      "cardinality" : 1267,
      "min_value" : -100.3,
      "max_value" : 389.12,
      "mean_value" : 17.499898989898995,
      "median_value" : 12.35,
      "top_hits" : [
        {
          "value" : 7.3,
          "count" : 478
        },
        {
          "value" : 8.3,
          "count" : 443
        },
        {
          "value" : 8.8,
          "count" : 420
        },
        {
          "value" : 6.8,
          "count" : 406
        },
        {
          "value" : 7.8,
          "count" : 405
        },
        {
          "value" : 6.3,
          "count" : 371
        },
        {
          "value" : 9.8,
          "count" : 368
        },
        {
          "value" : 5.8,
          "count" : 362
        },
        {
          "value" : 9.3,
          "count" : 332
        },
        {
          "value" : 10.3,
          "count" : 332
        }
      ]
    },
    "tpep_dropoff_datetime" : {
      "count" : 19998,
      "cardinality" : 9066,
      "earliest" : "2018-05-31 06:18:15",
      "latest" : "2018-06-02 02:25:44",
      "top_hits" : [
        {
          "value" : "2018-06-01 01:12:12",
          "count" : 10
        },
        {
          "value" : "2018-06-01 00:32:15",
          "count" : 9
        },
        {
          "value" : "2018-06-01 00:44:27",
          "count" : 9
        },
        {
          "value" : "2018-06-01 00:46:42",
          "count" : 9
        },
        {
          "value" : "2018-06-01 01:03:22",
          "count" : 9
        },
        {
          "value" : "2018-06-01 01:05:13",
          "count" : 9
        },
        {
          "value" : "2018-06-01 00:11:20",
          "count" : 8
        },
        {
          "value" : "2018-06-01 00:16:03",
          "count" : 8
        },
        {
          "value" : "2018-06-01 00:19:47",
          "count" : 8
        },
        {
          "value" : "2018-06-01 00:25:17",
          "count" : 8
        }
      ]
    },
    "tpep_pickup_datetime" : {
      "count" : 19998,
      "cardinality" : 8760,
      "earliest" : "2018-05-31 06:08:31",
      "latest" : "2018-06-02 01:21:21",
      "top_hits" : [
        {
          "value" : "2018-06-01 00:01:23",
          "count" : 12
        },
        {
          "value" : "2018-06-01 00:04:31",
          "count" : 10
        },
        {
          "value" : "2018-06-01 00:05:38",
          "count" : 10
        },
        {
          "value" : "2018-06-01 00:09:50",
          "count" : 10
        },
        {
          "value" : "2018-06-01 00:12:01",
          "count" : 10
        },
        {
          "value" : "2018-06-01 00:14:17",
          "count" : 10
        },
        {
          "value" : "2018-06-01 00:00:34",
          "count" : 9
        },
        {
          "value" : "2018-06-01 00:00:40",
          "count" : 9
        },
        {
          "value" : "2018-06-01 00:02:53",
          "count" : 9
        },
        {
          "value" : "2018-06-01 00:05:40",
          "count" : 9
        }
      ]
    },
    "trip_distance" : {
      "count" : 19998,
      "cardinality" : 1687,
      "min_value" : 0,
      "max_value" : 64.63,
      "mean_value" : 3.6521062106210715,
      "median_value" : 2.16,
      "top_hits" : [
        {
          "value" : 0.9,
          "count" : 335
        },
        {
          "value" : 0.8,
          "count" : 320
        },
        {
          "value" : 1.1,
          "count" : 316
        },
        {
          "value" : 0.7,
          "count" : 304
        },
        {
          "value" : 1.2,
          "count" : 303
        },
        {
          "value" : 1,
          "count" : 296
        },
        {
          "value" : 1.3,
          "count" : 280
        },
        {
          "value" : 1.5,
          "count" : 268
        },
        {
          "value" : 1.6,
          "count" : 268
        },
        {
          "value" : 0.6,
          "count" : 256
        }
      ]
    }
  }
}

num_messages_analyzednum_lines_analyzed 小 2,因为只有数据记录才算作消息。第一行包含列名,在此示例中,第二行为空。

与第一个示例不同,在这种情况下,format 已被识别为 delimited

因为 formatdelimited,所以输出中的 column_names 字段按它们在样本中出现的顺序列出列名。

has_header_row 指示在此样本中,列名位于样本的第一行。(如果它们不在第一行,则最好在 column_names 查询参数中指定它们。)

此样本的 delimiter 为逗号,因为它采用 CSV 格式的文本。

quote 字符是默认的双引号。(结构查找器不会尝试推断任何其他引号字符,因此,如果您有以其他字符加引号的分隔文本,则必须使用 quote 查询参数指定它。)

timestamp_field 已选择为 tpep_pickup_datetimetpep_dropoff_datetime 也同样有效,但选择 tpep_pickup_datetime 是因为它在列顺序中排在前面。如果您更喜欢 tpep_dropoff_datetime,则可以使用 timestamp_field 查询参数强制选择它。

joda_timestamp_formats 用于告诉 Logstash 如何解析时间戳。

java_timestamp_formats 是时间字段中识别的 Java 时间格式。Elasticsearch 映射和摄取管道使用此格式。

此样本中的时间戳格式未指定时区,因此要将它们准确转换为 UTC 时间戳以存储在 Elasticsearch 中,需要提供它们相关的时区。need_client_timezone 对于包含时区的时间戳格式将为 false

设置超时参数
编辑

如果您尝试分析大量数据,则分析将需要很长时间。如果您想限制 Elasticsearch 集群为请求执行的处理量,请使用 timeout 查询参数。超时到期时,分析将被中止并返回错误。例如,您可以将前一个示例中的 20000 行替换为 200000,并将分析的超时设置为 1 秒

curl -s "s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2018-06.csv" | head -200000 | curl -s -H "Content-Type: application/json" -XPOST "localhost:9200/_text_structure/find_structure?pretty&lines_to_sample=200000&timeout=1s" -T -

除非您使用的是速度极快的计算机,否则您将收到超时错误

{
  "error" : {
    "root_cause" : [
      {
        "type" : "timeout_exception",
        "reason" : "Aborting structure analysis during [delimited record parsing] as it has taken longer than the timeout of [1s]"
      }
    ],
    "type" : "timeout_exception",
    "reason" : "Aborting structure analysis during [delimited record parsing] as it has taken longer than the timeout of [1s]"
  },
  "status" : 500
}

如果您自己尝试上述示例,您会注意到 curl 命令的总运行时间远超过 1 秒。这是因为从互联网下载 200000 行 CSV 需要一段时间,并且超时是从此端点开始处理数据的时间开始计算的。

分析 Elasticsearch 日志文件
编辑

这是一个分析 Elasticsearch 日志文件的示例

curl -s -H "Content-Type: application/json" -XPOST
"localhost:9200/_text_structure/find_structure?pretty&ecs_compatibility=disabled" -T "$ES_HOME/logs/elasticsearch.log"

如果请求未遇到错误,结果将如下所示

{
  "num_lines_analyzed" : 53,
  "num_messages_analyzed" : 53,
  "sample_start" : "[2018-09-27T14:39:28,518][INFO ][o.e.e.NodeEnvironment    ] [node-0] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [165.4gb], net total_space [464.7gb], types [hfs]\n[2018-09-27T14:39:28,521][INFO ][o.e.e.NodeEnvironment    ] [node-0] heap size [494.9mb], compressed ordinary object pointers [true]\n",
  "charset" : "UTF-8",
  "has_byte_order_marker" : false,
  "format" : "semi_structured_text", 
  "multiline_start_pattern" : "^\\[\\b\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}", 
  "grok_pattern" : "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel}.*", 
  "ecs_compatibility" : "disabled", 
  "timestamp_field" : "timestamp",
  "joda_timestamp_formats" : [
    "ISO8601"
  ],
  "java_timestamp_formats" : [
    "ISO8601"
  ],
  "need_client_timezone" : true,
  "mappings" : {
    "properties" : {
      "@timestamp" : {
        "type" : "date"
      },
      "loglevel" : {
        "type" : "keyword"
      },
      "message" : {
        "type" : "text"
      }
    }
  },
  "ingest_pipeline" : {
    "description" : "Ingest pipeline created by text structure finder",
    "processors" : [
      {
        "grok" : {
          "field" : "message",
          "patterns" : [
            "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel}.*"
          ]
        }
      },
      {
        "date" : {
          "field" : "timestamp",
          "timezone" : "{{ event.timezone }}",
          "formats" : [
            "ISO8601"
          ]
        }
      },
      {
        "remove" : {
          "field" : "timestamp"
        }
      }
    ]
  },
  "field_stats" : {
    "loglevel" : {
      "count" : 53,
      "cardinality" : 3,
      "top_hits" : [
        {
          "value" : "INFO",
          "count" : 51
        },
        {
          "value" : "DEBUG",
          "count" : 1
        },
        {
          "value" : "WARN",
          "count" : 1
        }
      ]
    },
    "timestamp" : {
      "count" : 53,
      "cardinality" : 28,
      "earliest" : "2018-09-27T14:39:28,518",
      "latest" : "2018-09-27T14:39:37,012",
      "top_hits" : [
        {
          "value" : "2018-09-27T14:39:29,859",
          "count" : 10
        },
        {
          "value" : "2018-09-27T14:39:29,860",
          "count" : 9
        },
        {
          "value" : "2018-09-27T14:39:29,858",
          "count" : 6
        },
        {
          "value" : "2018-09-27T14:39:28,523",
          "count" : 3
        },
        {
          "value" : "2018-09-27T14:39:34,234",
          "count" : 2
        },
        {
          "value" : "2018-09-27T14:39:28,518",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:28,521",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:28,522",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:29,861",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:32,786",
          "count" : 1
        }
      ]
    }
  }
}

这次 format 已被识别为 semi_structured_text

multiline_start_pattern 基于时间戳出现在每个多行日志消息的第一行而设置。

已创建一个非常简单的 grok_pattern,它提取时间戳和出现在每个分析消息中的可识别字段。在这种情况下,除了时间戳之外,唯一识别的字段是日志级别。

使用的 ECS Grok 模式兼容性模式可以是 disabled(如果请求中未指定,则为默认值)或 v1

指定 grok_pattern 作为查询参数
编辑

如果您识别的字段比结构查找器无辅助生成的简单 grok_pattern 多,则您可以重新提交请求,将更高级的 grok_pattern 指定为查询参数,结构查找器将为您的附加字段计算 field_stats

对于 Elasticsearch 日志,更完整Grok模式是 \[%{TIMESTAMP_ISO8601:timestamp}\]\[%{LOGLEVEL:loglevel} *\]\[%{JAVACLASS:class} *\] \[%{HOSTNAME:node}\] %{JAVALOGMESSAGE:message}。您可以再次分析相同的文本,将此 grok_pattern 作为查询参数提交(适当地进行 URL 转义)

curl -s -H "Content-Type: application/json" -XPOST "localhost:9200/_text_structure/find_structure?pretty&format=semi_structured_text&grok_pattern=%5C%5B%25%7BTIMESTAMP_ISO8601:timestamp%7D%5C%5D%5C%5B%25%7BLOGLEVEL:loglevel%7D%20*%5C%5D%5C%5B%25%7BJAVACLASS:class%7D%20*%5C%5D%20%5C%5B%25%7BHOSTNAME:node%7D%5C%5D%20%25%7BJAVALOGMESSAGE:message%7D" -T "$ES_HOME/logs/elasticsearch.log"

如果请求未遇到错误,结果将如下所示

{
  "num_lines_analyzed" : 53,
  "num_messages_analyzed" : 53,
  "sample_start" : "[2018-09-27T14:39:28,518][INFO ][o.e.e.NodeEnvironment    ] [node-0] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [165.4gb], net total_space [464.7gb], types [hfs]\n[2018-09-27T14:39:28,521][INFO ][o.e.e.NodeEnvironment    ] [node-0] heap size [494.9mb], compressed ordinary object pointers [true]\n",
  "charset" : "UTF-8",
  "has_byte_order_marker" : false,
  "format" : "semi_structured_text",
  "multiline_start_pattern" : "^\\[\\b\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}",
  "grok_pattern" : "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel} *\\]\\[%{JAVACLASS:class} *\\] \\[%{HOSTNAME:node}\\] %{JAVALOGMESSAGE:message}", 
  "ecs_compatibility" : "disabled", 
  "timestamp_field" : "timestamp",
  "joda_timestamp_formats" : [
    "ISO8601"
  ],
  "java_timestamp_formats" : [
    "ISO8601"
  ],
  "need_client_timezone" : true,
  "mappings" : {
    "properties" : {
      "@timestamp" : {
        "type" : "date"
      },
      "class" : {
        "type" : "keyword"
      },
      "loglevel" : {
        "type" : "keyword"
      },
      "message" : {
        "type" : "text"
      },
      "node" : {
        "type" : "keyword"
      }
    }
  },
  "ingest_pipeline" : {
    "description" : "Ingest pipeline created by text structure finder",
    "processors" : [
      {
        "grok" : {
          "field" : "message",
          "patterns" : [
            "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel} *\\]\\[%{JAVACLASS:class} *\\] \\[%{HOSTNAME:node}\\] %{JAVALOGMESSAGE:message}"
          ]
        }
      },
      {
        "date" : {
          "field" : "timestamp",
          "timezone" : "{{ event.timezone }}",
          "formats" : [
            "ISO8601"
          ]
        }
      },
      {
        "remove" : {
          "field" : "timestamp"
        }
      }
    ]
  },
  "field_stats" : { 
    "class" : {
      "count" : 53,
      "cardinality" : 14,
      "top_hits" : [
        {
          "value" : "o.e.p.PluginsService",
          "count" : 26
        },
        {
          "value" : "o.e.c.m.MetadataIndexTemplateService",
          "count" : 8
        },
        {
          "value" : "o.e.n.Node",
          "count" : 7
        },
        {
          "value" : "o.e.e.NodeEnvironment",
          "count" : 2
        },
        {
          "value" : "o.e.a.ActionModule",
          "count" : 1
        },
        {
          "value" : "o.e.c.s.ClusterApplierService",
          "count" : 1
        },
        {
          "value" : "o.e.c.s.MasterService",
          "count" : 1
        },
        {
          "value" : "o.e.d.DiscoveryModule",
          "count" : 1
        },
        {
          "value" : "o.e.g.GatewayService",
          "count" : 1
        },
        {
          "value" : "o.e.l.LicenseService",
          "count" : 1
        }
      ]
    },
    "loglevel" : {
      "count" : 53,
      "cardinality" : 3,
      "top_hits" : [
        {
          "value" : "INFO",
          "count" : 51
        },
        {
          "value" : "DEBUG",
          "count" : 1
        },
        {
          "value" : "WARN",
          "count" : 1
        }
      ]
    },
    "message" : {
      "count" : 53,
      "cardinality" : 53,
      "top_hits" : [
        {
          "value" : "Using REST wrapper from plugin org.elasticsearch.xpack.security.Security",
          "count" : 1
        },
        {
          "value" : "adding template [.monitoring-alerts] for index patterns [.monitoring-alerts-6]",
          "count" : 1
        },
        {
          "value" : "adding template [.monitoring-beats] for index patterns [.monitoring-beats-6-*]",
          "count" : 1
        },
        {
          "value" : "adding template [.monitoring-es] for index patterns [.monitoring-es-6-*]",
          "count" : 1
        },
        {
          "value" : "adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-6-*]",
          "count" : 1
        },
        {
          "value" : "adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-6-*]",
          "count" : 1
        },
        {
          "value" : "adding template [.triggered_watches] for index patterns [.triggered_watches*]",
          "count" : 1
        },
        {
          "value" : "adding template [.watch-history-9] for index patterns [.watcher-history-9*]",
          "count" : 1
        },
        {
          "value" : "adding template [.watches] for index patterns [.watches*]",
          "count" : 1
        },
        {
          "value" : "starting ...",
          "count" : 1
        }
      ]
    },
    "node" : {
      "count" : 53,
      "cardinality" : 1,
      "top_hits" : [
        {
          "value" : "node-0",
          "count" : 53
        }
      ]
    },
    "timestamp" : {
      "count" : 53,
      "cardinality" : 28,
      "earliest" : "2018-09-27T14:39:28,518",
      "latest" : "2018-09-27T14:39:37,012",
      "top_hits" : [
        {
          "value" : "2018-09-27T14:39:29,859",
          "count" : 10
        },
        {
          "value" : "2018-09-27T14:39:29,860",
          "count" : 9
        },
        {
          "value" : "2018-09-27T14:39:29,858",
          "count" : 6
        },
        {
          "value" : "2018-09-27T14:39:28,523",
          "count" : 3
        },
        {
          "value" : "2018-09-27T14:39:34,234",
          "count" : 2
        },
        {
          "value" : "2018-09-27T14:39:28,518",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:28,521",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:28,522",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:29,861",
          "count" : 1
        },
        {
          "value" : "2018-09-27T14:39:32,786",
          "count" : 1
        }
      ]
    }
  }
}

输出中的 grok_pattern 现在是查询参数中提供的被覆盖的模式。

使用的 ECS Grok 模式兼容性模式可以是 disabled(如果请求中未指定,则为默认值)或 v1

返回的 field_stats 包含来自被覆盖的 grok_pattern 的字段的条目。

URL 转义很困难,因此如果您正在交互式工作,最好使用 UI!