查找消息结构 API
编辑查找消息结构 API
编辑查找文本消息列表的结构。
请求
编辑GET _text_structure/find_message_structure
POST _text_structure/find_message_structure
先决条件
编辑- 如果启用了 Elasticsearch 安全功能,您必须拥有
monitor_text_structure
或monitor
集群权限才能使用此 API。请参阅 安全权限。
描述
编辑此 API 提供了一个将数据以适合后续与 Elastic Stack 其他功能一起使用的格式摄取到 Elasticsearch 中的起点。当您的输入文本已经被其他进程拆分成单独的消息时,请优先使用此 API 而不是 find_structure
。
来自 API 的响应包含
- 示例消息。
- 统计信息,揭示文本中检测到的所有字段的最常见值,以及数字字段的基本数字统计信息。
- 有关文本结构的信息,当您编写用于索引或以类似格式化的文本的摄取配置时,此信息非常有用。
- 适用于 Elasticsearch 索引的映射,您可以使用这些映射来摄取文本。
所有这些信息都可以由结构查找器在没有指导的情况下计算出来。但是,您可以选择通过指定一个或多个查询参数来覆盖有关文本结构的一些决策。
输出的详细信息可以在示例中看到。
如果结构查找器产生意外结果,请指定 explain
查询参数,响应中将出现 explanation
。它有助于确定为什么选择返回的结构。
查询参数
编辑-
column_names
- (可选,字符串)如果您已将
format
设置为delimited
,则可以在逗号分隔的列表中指定列名称。如果未指定此参数,则结构查找器将使用文本标题行中的列名称。如果文本没有标题行,则列将被命名为“column1”、“column2”、“column3”等。 -
delimiter
- (可选,字符串)如果您已将
format
设置为delimited
,则可以指定用于分隔每行中值的字符。仅支持单个字符;分隔符不能有多个字符。默认情况下,API 考虑以下可能性:逗号、制表符、分号和管道符(|
)。在这种默认情况下,所有行必须具有相同数量的字段才能检测到分隔格式。如果指定分隔符,则最多 10% 的行可以具有与第一行不同数量的列。 -
explain
- (可选,布尔值)如果为
true
,则响应将包含一个名为explanation
的字段,该字段是一个字符串数组,指示结构查找器如何产生其结果。默认值为false
。 -
format
- (可选,字符串)文本的高级结构。有效值包括
ndjson
、xml
、delimited
和semi_structured_text
。默认情况下,API 会选择格式。在这种默认情况下,所有行必须具有相同数量的字段才能检测到分隔格式。但是,如果format
设置为delimited
并且未设置delimiter
,则 API 允许最多 5% 的行的列数与第一行不同。 -
grok_pattern
- (可选,字符串)如果您已将
format
设置为semi_structured_text
,则可以指定一个 Grok 模式,该模式用于从文本中的每条消息中提取字段。Grok 模式中时间戳字段的名称必须与timestamp_field
参数中指定的名称匹配。如果未指定该参数,则 Grok 模式中时间戳字段的名称必须与“timestamp”匹配。如果未指定grok_pattern
,则结构查找器将创建一个 Grok 模式。 -
ecs_compatibility
- (可选,字符串)与 ECS 兼容的 Grok 模式的兼容性模式。使用此参数指定当结构查找器创建 Grok 模式时,是使用 ECS Grok 模式还是使用旧版模式。有效值为
disabled
和v1
。默认值为disabled
。当整个消息 Grok 模式(如%{CATALINALOG}
)与输入匹配时,此设置主要会产生影响。如果结构查找器识别出通用结构,但不知道含义,则在grok_pattern
输出中使用诸如path
、ipaddress
、field1
和field2
之类的通用字段名称,目的是让知道含义的用户在使用之前重命名这些字段。 -
quote
- (可选,字符串)如果您已将
format
设置为delimited
,则可以指定用于引用每行中值的字符(如果它们包含换行符或分隔符)。仅支持单个字符。如果未指定此参数,则默认值为双引号("
)。如果您的分隔文本格式不使用引号,则一种解决方法是将此参数设置为样本中任何位置都不会出现的字符。 -
should_trim_fields
- (可选,布尔值)如果您已将
format
设置为delimited
,则可以指定是否应从分隔符之间的值中删除空格。如果未指定此参数且分隔符为管道符(|
),则默认值为true
。否则,默认值为false
。 -
timeout
- (可选,时间单位)设置结构分析可能花费的最长时间。如果分析在超时到期时仍在运行,则将停止分析。默认值为 25 秒。
-
timestamp_field
-
(可选,字符串)包含文本中每条记录主时间戳的字段的名称。特别是,如果将文本摄取到索引中,则此字段将用于填充
@timestamp
字段。如果
format
为semi_structured_text
,则此字段必须与grok_pattern
中相应提取的名称匹配。因此,对于半结构化文本,最好不要指定此参数,除非还指定了grok_pattern
。对于结构化文本,如果指定此参数,则该字段必须存在于文本中。
如果未指定此参数,则结构查找器将决定哪个字段(如果有)是主时间戳字段。对于结构化文本,文本中没有时间戳是强制性的。
-
timestamp_format
-
(可选,字符串)文本中时间戳字段的 Java 时间格式。
仅支持 Java 时间格式字母组的子集
-
a
-
d
-
dd
-
EEE
-
EEEE
-
H
-
HH
-
h
-
M
-
MM
-
MMM
-
MMMM
-
mm
-
ss
-
XX
-
XXX
-
yy
-
yyyy
-
zzz
此外,只要它们出现在
ss
之后并与ss
由.
、,
或:
分隔,则支持长度为 1 到 9 的S
字母组(小数秒)。允许使用空格和标点符号,但?
、换行符和回车符除外,以及用单引号括起来的文字文本。例如,MM/dd HH.mm.ss,SSSSSS 'in' yyyy
是一种有效的覆盖格式。此参数的一个有价值的用例是,当格式为半结构化文本时,文本中有多个时间戳格式,并且您知道哪个格式对应于主时间戳,但是您不想指定完整的
grok_pattern
。另一个是当时间戳格式是结构查找器默认情况下不考虑的格式时。如果未指定此参数,则结构查找器会从内置集中选择最佳格式。
如果指定了特殊值
null
,则结构查找器将不会在文本中查找主时间戳。当格式为半结构化文本时,这将导致结构查找器将文本视为单行消息。下表为一些示例时间戳提供了适当的
timeformat
值时间格式 表示形式 yyyy-MM-dd HH:mm:ssZ
2019-04-20 13:15:22+0000
EEE, d MMM yyyy HH:mm:ss Z
Sat, 20 Apr 2019 13:15:22 +0000
dd.MM.yy HH:mm:ss.SSS
20.04.19 13:15:22.285
有关日期和时间格式语法的更多信息,请参阅 Java 日期/时间格式文档。
-
请求正文
编辑-
messages
- (必需,字符串数组)您要分析的消息列表。
示例
编辑分析 Elasticsearch 日志文件
编辑假设您有一个 Elasticsearch 日志消息列表。您可以将其发送到 find_message_structure
端点,如下所示
resp = client.text_structure.find_message_structure( body={ "messages": [ "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]", "[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]", "[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled", "[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled", "[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled", "[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]", "[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]", "[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized", "[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ..." ] }, ) print(resp)
response = client.text_structure.find_message_structure( body: { messages: [ '[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128', '[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]', '[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]', '[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]', '[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]', '[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]', '[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]', '[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]', '[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]', '[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]', '[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled', '[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled', '[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled', '[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]', '[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]', '[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized', '[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ...' ] } ) puts response
const response = await client.textStructure.findMessageStructure({ body: { messages: [ "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]", "[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]", "[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled", "[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled", "[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled", "[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]", "[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]", "[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized", "[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ...", ], }, }); console.log(response);
POST _text_structure/find_message_structure { "messages": [ "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-monitoring]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-ent-search]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-expression]", "[2024-03-05T10:52:41,059][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-eql]", "[2024-03-05T10:52:43,291][INFO ][o.e.e.NodeEnvironment ] [laptop] heap size [16gb], compressed ordinary object pointers [true]", "[2024-03-05T10:52:46,098][INFO ][o.e.x.s.Security ] [laptop] Security is enabled", "[2024-03-05T10:52:47,227][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] Profiling is enabled", "[2024-03-05T10:52:47,259][INFO ][o.e.x.p.ProfilingPlugin ] [laptop] profiling index templates will not be installed or reinstalled", "[2024-03-05T10:52:47,755][INFO ][o.e.i.r.RecoverySettings ] [laptop] using rate limit [40mb] with [default=40mb, read=0b, write=0b, max=0b]", "[2024-03-05T10:52:47,787][INFO ][o.e.d.DiscoveryModule ] [laptop] using discovery type [multi-node] and seed hosts providers [settings]", "[2024-03-05T10:52:49,188][INFO ][o.e.n.Node ] [laptop] initialized", "[2024-03-05T10:52:49,199][INFO ][o.e.n.Node ] [laptop] starting ..." ] }
如果请求没有遇到错误,您将收到以下结果
{ "num_lines_analyzed" : 22, "num_messages_analyzed" : 22, "sample_start" : "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128\n[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]\n", "charset" : "UTF-8", "format" : "semi_structured_text", "multiline_start_pattern" : "^\\[\\b\\d{4}-\\d{2}-\\d{2}[T ]\\d{2}:\\d{2}", "grok_pattern" : "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel} \\]\\[.*", "ecs_compatibility" : "disabled", "timestamp_field" : "timestamp", "joda_timestamp_formats" : [ "ISO8601" ], "java_timestamp_formats" : [ "ISO8601" ], "need_client_timezone" : true, "mappings" : { "properties" : { "@timestamp" : { "type" : "date" }, "loglevel" : { "type" : "keyword" }, "message" : { "type" : "text" } } }, "ingest_pipeline" : { "description" : "Ingest pipeline created by text structure finder", "processors" : [ { "grok" : { "field" : "message", "patterns" : [ "\\[%{TIMESTAMP_ISO8601:timestamp}\\]\\[%{LOGLEVEL:loglevel} \\]\\[.*" ], "ecs_compatibility" : "disabled" } }, { "date" : { "field" : "timestamp", "timezone" : "{{ event.timezone }}", "formats" : [ "ISO8601" ] } }, { "remove" : { "field" : "timestamp" } } ] }, "field_stats" : { "loglevel" : { "count" : 22, "cardinality" : 1, "top_hits" : [ { "value" : "INFO", "count" : 22 } ] }, "message" : { "count" : 22, "cardinality" : 22, "top_hits" : [ { "value" : "[2024-03-05T10:52:36,256][INFO ][o.a.l.u.VectorUtilPanamaProvider] [laptop] Java vector incubator API enabled; uses preferredBitSize=128", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,038][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-url]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,042][INFO ][o.e.p.PluginsService ] [laptop] loaded module [rest-root]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [ingest-user-agent]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-core]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,043][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-redact]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [lang-painless]]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [repository-s3]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-analytics]", "count" : 1 }, { "value" : "[2024-03-05T10:52:41,044][INFO ][o.e.p.PluginsService ] [laptop] loaded module [x-pack-autoscaling]", "count" : 1 } ] }, "timestamp" : { "count" : 22, "cardinality" : 14, "earliest" : "2024-03-05T10:52:36,256", "latest" : "2024-03-05T10:52:49,199", "top_hits" : [ { "value" : "2024-03-05T10:52:41,044", "count" : 6 }, { "value" : "2024-03-05T10:52:41,043", "count" : 3 }, { "value" : "2024-03-05T10:52:41,059", "count" : 2 }, { "value" : "2024-03-05T10:52:36,256", "count" : 1 }, { "value" : "2024-03-05T10:52:41,038", "count" : 1 }, { "value" : "2024-03-05T10:52:41,042", "count" : 1 }, { "value" : "2024-03-05T10:52:43,291", "count" : 1 }, { "value" : "2024-03-05T10:52:46,098", "count" : 1 }, { "value" : "2024-03-05T10:52:47,227", "count" : 1 }, { "value" : "2024-03-05T10:52:47,259", "count" : 1 } ] } } }
有关响应格式的详细描述,或关于摄取分隔文本(如 CSV)或换行符分隔的 JSON 的其他示例,请参阅查找文本结构端点的示例。