› › ›

剖析数据

Dissect 将单个文本字段与定义的模式进行匹配。剖析模式由您想要丢弃的字符串部分定义。特别注意字符串的每个部分有助于构建成功的剖析模式。

如果您不需要正则表达式的强大功能，请使用 dissect 模式而不是 grok。Dissect 使用比 grok 更简单的语法，并且通常整体上速度更快。Dissect 的语法是透明的：告诉 dissect 您想要什么，它会将这些结果返回给您。

剖析模式

编辑

剖析模式由变量和分隔符组成。任何由百分号和花括号 %{ } 定义的内容都被视为变量，例如 %{clientip}。您可以将变量分配给字段中数据的任何部分，然后只返回您想要的部分。分隔符是变量之间的任何值，可以是空格、破折号或其他分隔符。

例如，假设您有日志数据，其中 message 字段如下所示

"message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"

您可以将变量分配给数据的每个部分，以构建成功的剖析模式。请记住，准确地告诉 dissect 您想要匹配的内容。

数据的第一部分看起来像一个 IP 地址，因此您可以分配一个像 %{clientip} 这样的变量。接下来的两个字符是带有两侧空格的破折号。您可以为每个破折号分配一个变量，也可以分配一个变量来表示破折号和空格。接下来是一组包含时间戳的方括号。方括号是分隔符，因此您将其包含在剖析模式中。到目前为止，数据和匹配的剖析模式如下所示

247.37.0.0 - - [30/Apr/2020:14:31:22 -0500]  

%{clientip} %{ident} %{auth} [%{@timestamp}]

	`message` 字段中的第一部分数据
	用于匹配选定数据块的剖析模式

使用相同的逻辑，您可以为剩余的数据块创建变量。双引号是分隔符，因此将其包含在您的剖析模式中。该模式将 GET 替换为 %{verb} 变量，但保留 HTTP 作为模式的一部分。

\"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0

"%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}

组合这两个模式将得到如下所示的剖析模式

%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} HTTP/%{httpversion}\" %{status} %{size}

现在您有了剖析模式，如何测试和使用它呢？

使用 Painless 测试剖析模式

编辑

您可以将剖析模式合并到 Painless 脚本中以提取数据。要测试您的脚本，请使用 Painless 执行 API 的字段上下文，或创建一个包含脚本的运行时字段。运行时字段提供了更大的灵活性并接受多个文档，但如果您在测试脚本的集群上没有写入权限，Painless 执行 API 是一个不错的选择。

例如，通过包含您的 Painless 脚本和与您的数据匹配的单个文档，使用 Painless 执行 API 测试您的剖析模式。首先将 message 字段索引为 wildcard 数据类型

resp = client.indices.create(
    index="my-index",
    mappings={
        "properties": {
            "message": {
                "type": "wildcard"
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'my-index',
  body: {
    mappings: {
      properties: {
        message: {
          type: 'wildcard'
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "my-index",
  mappings: {
    properties: {
      message: {
        type: "wildcard",
      },
    },
  },
});
console.log(response);

PUT my-index
{
  "mappings": {
    "properties": {
      "message": {
        "type": "wildcard"
      }
    }
  }
}

Copy as curl Try in Elastic

如果您想检索 HTTP 响应代码，请将您的剖析模式添加到提取 response 值的 Painless 脚本中。要从字段中提取值，请使用此函数

`.extract(doc["<field_name>"].value)?.<field_value>`

在此示例中，message 是 <field_name>，response 是 <field_value>

resp = client.scripts_painless_execute(
    script={
        "source": "\n      String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} HTTP/%{httpversion}\" %{response} %{size}').extract(doc[\"message\"].value)?.response;\n        if (response != null) emit(Integer.parseInt(response)); \n    "
    },
    context="long_field",
    context_setup={
        "index": "my-index",
        "document": {
            "message": "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
        }
    },
)
print(resp)

const response = await client.scriptsPainlessExecute({
  script: {
    source:
      '\n      String response=dissect(\'%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}\').extract(doc["message"].value)?.response;\n        if (response != null) emit(Integer.parseInt(response)); \n    ',
  },
  context: "long_field",
  context_setup: {
    index: "my-index",
    document: {
      message:
        '247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0',
    },
  },
});
console.log(response);

POST /_scripts/painless/_execute
{
  "script": {
    "source": """
      String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
        if (response != null) emit(Integer.parseInt(response)); 
    """
  },
  "context": "long_field", 
  "context_setup": {
    "index": "my-index",
    "document": {          
      "message": """247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0"""
    }
  }
}

Copy as curl Try in Elastic

	运行时字段需要 `emit` 方法来返回值。
	由于响应代码是整数，请使用 `long_field` 上下文。
	包含一个与您的数据匹配的示例文档。

结果包括 HTTP 响应代码

{
  "result" : [
    304
  ]
}

在运行时字段中使用剖析模式和脚本

编辑

如果您有一个功能正常的剖析模式，您可以将其添加到运行时字段以操作数据。由于运行时字段不需要您索引字段，因此您可以非常灵活地修改您的脚本及其功能。如果您已经使用 Painless 执行 API 测试了您的剖析模式，您可以在运行时字段中使用完全相同的 Painless 脚本。

首先，像上一节中一样，将 message 字段添加为 wildcard 类型，但同时将 @timestamp 添加为 date 类型，以防您想对该字段进行操作以用于其他用例

resp = client.indices.create(
    index="my-index",
    mappings={
        "properties": {
            "@timestamp": {
                "format": "strict_date_optional_time||epoch_second",
                "type": "date"
            },
            "message": {
                "type": "wildcard"
            }
        }
    },
)
print(resp)

response = client.indices.create(
  index: 'my-index',
  body: {
    mappings: {
      properties: {
        "@timestamp": {
          format: 'strict_date_optional_time||epoch_second',
          type: 'date'
        },
        message: {
          type: 'wildcard'
        }
      }
    }
  }
)
puts response

const response = await client.indices.create({
  index: "my-index",
  mappings: {
    properties: {
      "@timestamp": {
        format: "strict_date_optional_time||epoch_second",
        type: "date",
      },
      message: {
        type: "wildcard",
      },
    },
  },
});
console.log(response);

PUT /my-index/
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "format": "strict_date_optional_time||epoch_second",
        "type": "date"
      },
      "message": {
        "type": "wildcard"
      }
    }
  }
}

Copy as curl Try in Elastic

如果您想使用剖析模式提取 HTTP 响应代码，您可以创建一个像 http.response 这样的运行时字段

resp = client.indices.put_mapping(
    index="my-index",
    runtime={
        "http.response": {
            "type": "long",
            "script": "\n        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} HTTP/%{httpversion}\" %{response} %{size}').extract(doc[\"message\"].value)?.response;\n        if (response != null) emit(Integer.parseInt(response));\n      "
        }
    },
)
print(resp)

const response = await client.indices.putMapping({
  index: "my-index",
  runtime: {
    "http.response": {
      type: "long",
      script:
        '\n        String response=dissect(\'%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}\').extract(doc["message"].value)?.response;\n        if (response != null) emit(Integer.parseInt(response));\n      ',
    },
  },
});
console.log(response);

PUT my-index/_mappings
{
  "runtime": {
    "http.response": {
      "type": "long",
      "script": """
        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
        if (response != null) emit(Integer.parseInt(response));
      """
    }
  }
}

Copy as curl Try in Elastic

在映射您要检索的字段后，将一些日志数据记录索引到 Elasticsearch 中。以下请求使用批量 API 将原始日志数据索引到 my-index 中

resp = client.bulk(
    index="my-index",
    refresh=True,
    operations=[
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:30:17-05:00",
            "message": "40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
        },
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:30:53-05:00",
            "message": "232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
        },
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:31:12-05:00",
            "message": "26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
        },
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:31:19-05:00",
            "message": "247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"
        },
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:31:22-05:00",
            "message": "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
        },
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:31:27-05:00",
            "message": "252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"
        },
        {
            "index": {}
        },
        {
            "timestamp": "2020-04-30T14:31:28-05:00",
            "message": "not a valid apache log"
        }
    ],
)
print(resp)

response = client.bulk(
  index: 'my-index',
  refresh: true,
  body: [
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:30:17-05:00',
      message: '40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736'
    },
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:30:53-05:00',
      message: '232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736'
    },
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:31:12-05:00',
      message: '26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736'
    },
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:31:19-05:00',
      message: '247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] "GET /french/splash_inet.html HTTP/1.0" 200 3781'
    },
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:31:22-05:00',
      message: '247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0'
    },
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:31:27-05:00',
      message: '252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736'
    },
    {
      index: {}
    },
    {
      timestamp: '2020-04-30T14:31:28-05:00',
      message: 'not a valid apache log'
    }
  ]
)
puts response

const response = await client.bulk({
  index: "my-index",
  refresh: "true",
  operations: [
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:30:17-05:00",
      message:
        '40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736',
    },
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:30:53-05:00",
      message:
        '232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736',
    },
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:31:12-05:00",
      message:
        '26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736',
    },
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:31:19-05:00",
      message:
        '247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] "GET /french/splash_inet.html HTTP/1.0" 200 3781',
    },
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:31:22-05:00",
      message:
        '247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0',
    },
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:31:27-05:00",
      message:
        '252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] "GET /images/hm_bg.jpg HTTP/1.0" 200 24736',
    },
    {
      index: {},
    },
    {
      timestamp: "2020-04-30T14:31:28-05:00",
      message: "not a valid apache log",
    },
  ],
});
console.log(response);

POST /my-index/_bulk?refresh=true
{"index":{}}
{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
{"index":{}}
{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}

Copy as curl Try in Elastic

您可以定义一个简单的查询来搜索特定的 HTTP 响应并返回所有相关字段。使用搜索 API 的 fields 参数来检索 http.response 运行时字段。

resp = client.search(
    index="my-index",
    query={
        "match": {
            "http.response": "304"
        }
    },
    fields=[
        "http.response"
    ],
)
print(resp)

response = client.search(
  index: 'my-index',
  body: {
    query: {
      match: {
        'http.response' => '304'
      }
    },
    fields: [
      'http.response'
    ]
  }
)
puts response

const response = await client.search({
  index: "my-index",
  query: {
    match: {
      "http.response": "304",
    },
  },
  fields: ["http.response"],
});
console.log(response);

GET my-index/_search
{
  "query": {
    "match": {
      "http.response": "304"
    }
  },
  "fields" : ["http.response"]
}

Copy as curl Try in Elastic

或者，您可以在搜索请求的上下文中定义相同的运行时字段。运行时定义和脚本与先前在索引映射中定义的完全相同。只需将该定义复制到 runtime_mappings 部分下的搜索请求中，并包含一个与运行时字段匹配的查询。此查询返回与先前在索引映射中为 http.response 运行时字段定义的搜索查询相同的结果，但仅在此特定搜索的上下文中

resp = client.search(
    index="my-index",
    runtime_mappings={
        "http.response": {
            "type": "long",
            "script": "\n        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} HTTP/%{httpversion}\" %{response} %{size}').extract(doc[\"message\"].value)?.response;\n        if (response != null) emit(Integer.parseInt(response));\n      "
        }
    },
    query={
        "match": {
            "http.response": "304"
        }
    },
    fields=[
        "http.response"
    ],
)
print(resp)

const response = await client.search({
  index: "my-index",
  runtime_mappings: {
    "http.response": {
      type: "long",
      script:
        '\n        String response=dissect(\'%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}\').extract(doc["message"].value)?.response;\n        if (response != null) emit(Integer.parseInt(response));\n      ',
    },
  },
  query: {
    match: {
      "http.response": "304",
    },
  },
  fields: ["http.response"],
});
console.log(response);

GET my-index/_search
{
  "runtime_mappings": {
    "http.response": {
      "type": "long",
      "script": """
        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
        if (response != null) emit(Integer.parseInt(response));
      """
    }
  },
  "query": {
    "match": {
      "http.response": "304"
    }
  },
  "fields" : ["http.response"]
}

Copy as curl Try in Elastic

{
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index",
        "_id" : "D47UqXkBByC8cgZrkbOm",
        "_score" : 1.0,
        "_source" : {
          "timestamp" : "2020-04-30T14:31:22-05:00",
          "message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
        },
        "fields" : {
          "http.response" : [
            304
          ]
        }
      }
    ]
  }
}

« 脚本、缓存和搜索速度理解 Grok »

Was this helpful?

Feedback

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

剖析数据

剖析数据

剖析模式

使用 Painless 测试剖析模式

在运行时字段中使用剖析模式和脚本

Follow us

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards

About us

Join us

Partners

Trust & Security

Investor relations

Excellence Awards