查找消息结构 API
编辑查找消息结构 API
编辑查找一系列文本消息的结构。
请求
编辑GET _text_structure/find_message_structure
POST _text_structure/find_message_structure
前提条件
编辑- 如果启用了 Elasticsearch 安全功能,则必须具有
monitor_text_structure
或monitor
集群权限才能使用此 API。请参阅 安全权限。
描述
编辑此 API 为将数据以适合随后与其他 Elastic Stack 功能一起使用的格式导入 Elasticsearch 提供了一个起点。当您的输入文本已通过其他流程拆分为单独的消息时,优先使用此 API 而非 find_structure
。
API 的响应包含:
- 示例消息。
- 统计信息,揭示在文本中检测到的所有字段最常见的值,以及数字字段的基本数字统计信息。
- 有关文本结构的信息,这在您编写索引文本或类似格式文本的摄取配置时非常有用。
- Elasticsearch 索引的适当映射,您可以使用它来摄取文本。
结构查找器无需任何指导即可计算所有这些信息。但是,您可以选择通过指定一个或多个查询参数来覆盖有关文本结构的一些决策。
可以在 示例 中查看输出的详细信息。
如果结构查找器产生意外结果,请指定 explain
查询参数,响应中将出现 explanation
。它有助于确定选择返回结构的原因。
查询参数
编辑-
column_names
- (可选,字符串) 如果您已将
format
设置为delimited
,则可以指定以逗号分隔的列名列表。如果未指定此参数,则结构查找器将使用文本标题行中的列名。如果文本没有标题行,则列名为“column1”、“column2”、“column3”等。 -
delimiter
- (可选,字符串) 如果您已将
format
设置为delimited
,则可以指定用于分隔每一行中值的字符。仅支持单个字符;分隔符不能包含多个字符。默认情况下,API 会考虑以下可能性:逗号、制表符、分号和管道 (|
)。在此默认情况下,所有行必须具有相同数量的字段才能检测到分隔格式。如果指定分隔符,则最多 10% 的行可以与第一行具有不同数量的列。 -
explain
- (可选,布尔值) 如果为
true
,则响应包含名为explanation
的字段,这是一个字符串数组,指示结构查找器如何产生其结果。默认值为false
。 -
format
- (可选,字符串) 文本的高级结构。有效值为
ndjson
、xml
、delimited
和semi_structured_text
。默认情况下,API 会选择格式。在此默认情况下,所有行必须具有相同数量的字段才能检测到分隔格式。但是,如果format
设置为delimited
且未设置delimiter
,则 API 容忍最多 5% 的行与第一行具有不同数量的列。 -
grok_pattern
- (可选,字符串) 如果您已将
format
设置为semi_structured_text
,则可以指定一个 Grok 模式,用于从文本中的每条消息中提取字段。Grok 模式中时间戳字段的名称必须与timestamp_field
参数中指定的名称匹配。如果未指定该参数,则 Grok 模式中时间戳字段的名称必须与“timestamp”匹配。如果未指定grok_pattern
,则结构查找器将创建一个 Grok 模式。 -
ecs_compatibility
- (可选,字符串) 与符合 ECS 的 Grok 模式的兼容模式。当结构查找器创建 Grok 模式时,使用此参数指定是否使用 ECS Grok 模式而不是旧模式。有效值为
disabled
和v1
。默认值为disabled
。此设置主要在整个消息 Grok 模式(例如%{CATALINALOG}
)匹配输入时产生影响。如果结构查找器识别出常见的结构但不知道含义,则在grok_pattern
输出中使用通用字段名(如path
、ipaddress
、field1
和field2
),目的是让知道含义的用户在使用前重命名这些字段。 -
quote
- (可选,字符串) 如果您已将
format
设置为delimited
,则可以指定用于引用每一行中值的字符(如果它们包含换行符或分隔符字符)。仅支持单个字符。如果未指定此参数,则默认值为双引号 ("
)。如果您的分隔文本格式不使用引号,则一种解决方法是将此参数设置为文本中任何地方都不出现的字符。 -
should_trim_fields
- (可选,布尔值) 如果您已将
format
设置为delimited
,则可以指定分隔符之间的值是否应去除空格。如果未指定此参数且分隔符为管道 (|
),则默认值为true
。否则,默认值为false
。 -
timeout
- (可选,时间单位) 设置结构分析可能花费的最大时间量。如果分析在超时过期时仍在运行,则它将被停止。默认值为 25 秒。
-
timestamp_field
-
(可选,字符串) 包含文本中每条记录的主要时间戳的字段的名称。特别是,如果文本被摄取到索引中,则此字段将用于填充
@timestamp
字段。如果
format
为semi_structured_text
,则此字段必须与grok_pattern
中相应提取的名称匹配。因此,对于半结构化文本,除非也指定了grok_pattern
,否则最好不要指定此参数。对于结构化文本,如果指定此参数,则该字段必须存在于文本中。
如果未指定此参数,则结构查找器将决定哪个字段(如果有)是主要时间戳字段。对于结构化文本,文本中不必包含时间戳。
-
timestamp_format
-
(可选,字符串) 文本中时间戳字段的 Java 时间格式。
仅支持 Java 时间格式字母组的子集
-
a
-
d
-
dd
-
EEE
-
EEEE
-
H
-
HH
-
h
-
M
-
MM
-
MMM
-
MMMM
-
mm
-
ss
-
XX
-
XXX
-
yy
-
yyyy
-
zzz
此外,还支持长度为 1 到 9 的
S
字母组(秒的小数部分),前提是它们出现在ss
之后,并由.
、,
或:
与ss
分隔。还允许使用空格和标点符号,但?
、换行符和回车符除外,以及用单引号括起来的文字。例如,MM/dd HH.mm.ss,SSSSSS 'in' yyyy
是一个有效的重写格式。此参数的一个有价值的用例是,当格式为半结构化文本时,文本中存在多个时间戳格式,并且您知道哪个格式对应于主要时间戳,但您不想指定完整的
grok_pattern
。另一个用例是时间戳格式是结构查找器默认情况下不考虑的格式。如果未指定此参数,则结构查找器将从内置集中选择最佳格式。
如果指定了特殊值
null
,则结构查找器将不会在文本中查找主要时间戳。当格式为半结构化文本时,这将导致结构查找器将文本视为单行消息。下表提供了一些示例时间戳的适当
timeformat
值时间格式 表示 yyyy-MM-dd HH:mm:ssZ
2019-04-20 13:15:22+0000
EEE, d MMM yyyy HH:mm:ss Z
Sat, 20 Apr 2019 13:15:22 +0000
dd.MM.yy HH:mm:ss.SSS
20.04.19 13:15:22.285
有关日期和时间格式语法的更多信息,请参阅 Java 日期/时间格式文档。
-
请求正文
编辑-
messages
- (必需,字符串数组) 您要分析的消息列表。
示例
编辑分析 Elasticsearch 日志文件
编辑假设您有一系列 Elasticsearch 日志消息。您可以将其发送到 find_message_structure
端点,如下所示:
resp = client.text_structure.find_message_structure( body={ "messages": [ "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]", "[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]", "[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled", "[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled", "[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled", "[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]", "[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]", "[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized", "[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ..." ] }, ) print(resp)
response = client.text_structure.find_message_structure( body: { messages: [ '[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128', '[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]', '[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]', '[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]', '[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]', '[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]', '[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]', '[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]', '[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]', '[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled', '[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled', '[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled', '[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]', '[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]', '[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized', '[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ...' ] } ) puts response
const response = await client.textStructure.findMessageStructure({ body: { messages: [ "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]", "[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]", "[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled", "[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled", "[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled", "[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]", "[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]", "[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized", "[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ...", ], }, }); console.log(response);
POST _text_structure/find_message_structure { "messages": [ "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]", "[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]", "[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled", "[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled", "[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled", "[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]", "[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]", "[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized", "[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ..." ] }
如果请求未遇到错误,您将收到以下结果:
{ "num_lines_analyzed" : 22, "num_messages_analyzed" : 22, "sample_start" : "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128\n[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]\n", "charset" : "UTF-8", "format" : "semi_structured_text", "multiline_start_pattern" : "^\\[\\b\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}", "grok_pattern" : "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel} \\]\\[.*", "ecs_compatibility" : "disabled", "timestamp_field" : "timestamp", "joda_timestamp_formats" : [ "ISO8601" ], "java_timestamp_formats" : [ "ISO8601" ], "need_client_timezone" : true, "mappings" : { "properties" : { "@timestamp" : { "type" : "date" }, "loglevel" : { "type" : "keyword" }, "message" : { "type" : "text" } } }, "ingest_pipeline" : { "description" : "Ingest pipeline created by text structure finder", "processors" : [ { "grok" : { "field" : "message", "patterns" : [ "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel} \\]\\[.*" ], "ecs_compatibility" : "disabled" } }, { "date" : { "field" : "timestamp", "timezone" : "{{ event.timezone }}", "formats" : [ "ISO8601" ] } }, { "remove" : { "field" : "timestamp" } } ] }, "field_stats" : { "loglevel" : { "count" : 22, "cardinality" : 1, "top_hits" : [ { "value" : "INFO", "count" : 22 } ] }, "message" : { "count" : 22, "cardinality" : 22, "top_hits" : [ { "value" : "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "count" : 1 } ] }, "timestamp" : { "count" : 22, "cardinality" : 14, "earliest" : "2024-03-05T10:52:36,256", "latest" : "2024-03-05T10:52:49,199", "top_hits" : [ { "value" : "2024-03-05T10:52:41,044", "count" : 6 }, { "value" : "2024-03-05T10:52:41,043", "count" : 3 }, { "value" : "2024-03-05T10:52:41,059", "count" : 2 }, { "value" : "2024-03-05T10:52:36,256", "count" : 1 }, { "value" : "2024-03-05T10:52:41,038", "count" : 1 }, { "value" : "2024-03-05T10:52:41,042", "count" : 1 }, { "value" : "2024-03-05T10:52:43,291", "count" : 1 }, { "value" : "2024-03-05T10:52:46,098", "count" : 1 }, { "value" : "2024-03-05T10:52:47,227", "count" : 1 }, { "value" : "2024-03-05T10:52:47,259", "count" : 1 } ] } } }
有关响应格式的详细说明,或有关摄取分隔文本(例如 CSV)或换行符分隔 JSON 的其他示例,请参阅 查找文本结构端点的示例。