AWS CloudWatch 指标集

编辑

AWS 模块的 CloudWatch 指标集允许您监控 AWS 上的各种服务。 cloudwatch 指标集通过调用 GetMetricData API 定期从给定的命名空间获取指标。

AWS 权限

编辑

IAM 用户需要一些特定的 AWS 权限才能收集 AWS CloudWatch 指标。

ec2:DescribeRegions
cloudwatch:GetMetricData
cloudwatch:ListMetrics
tag:getResources
sts:GetCallerIdentity
iam:ListAccountAliases

指标集特定配置说明

编辑
  • namespace: ListMetrics API 用于筛选的命名空间。例如,AWS/EC2、AWS/S3。如果命名空间给定通配符 *,则会自动收集所有命名空间的指标。
  • name: 要筛选的指标名称。例如,EC2 实例的 CPUUtilization。
  • dimensions: 要筛选的维度。例如,InstanceId=i-123。
  • resource_type: 您希望返回的资源的约束。每个资源类型的格式为 service[:resourceType]。例如,指定资源类型为 ec2 将返回所有 Amazon EC2 资源(包括 EC2 实例)。指定资源类型为 ec2:instance 将仅返回 EC2 实例。
  • statistic: 统计数据是指定时间段内的指标数据聚合。默认情况下,统计数据包括平均值、总和、计数、最大值和最小值。

配置示例

编辑

为了更专注于 cloudwatch 指标集的用例,以下示例不包括 AWS 凭证的配置。有关在配置中设置 AWS 凭证以使此指标集进行正确的 AWS API 调用的更多详细信息,请参阅 AWS 凭证选项

示例 1

编辑
- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  tags_filter: 
    - key: "Organization"
      value: "Engineering"
  metrics:
    - namespace: AWS/EBS 
    - namespace: AWS/ELB 
      resource_type: elasticloadbalancing
    - namespace: AWS/EC2 
      name: CPUUtilization
      statistic: ["Average"]
      dimensions:
        - name: InstanceId
          value: i-0686946e22cf9494a

用户可以配置 cloudwatch 指标集以收集来自一个特定命名空间的所有指标,例如 AWS/EBS

cloudwatch 指标集还能够从 AWS 资源收集标签。如果指定了 resource_type,则标签将被收集并存储为事件的一部分。有关 resource_type 的更多详细信息,请参阅 AWS API GetResources

如果收集了标签(对于指定了 resource_type 的指标集),也可以使用模块特定配置中的 tags_filter 字段按标签过滤事件。

如果用户确切知道他们想要收集哪些 CloudWatch 指标,则可以使用此配置格式。需要指定 namespacemetricname,并且可以使用 dimensions 来过滤 CloudWatch 指标。有关更多详细信息,请参阅 AWS List Metrics

示例 2

编辑
- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: "*"

使用此配置,将从 CloudWatch 收集所有命名空间的指标。这里的限制是所有命名空间的收集周期都设置为相同,在本例中为 300 秒。这将导致 API 调用或数据丢失的额外成本。例如,来自命名空间 AWS/Usage 的指标每 1 分钟发送到 CloudWatch。如果收集周期等于 300 秒,则中间的数据点将丢失。来自命名空间 AWS/Billing 的指标每隔几个小时发送到 CloudWatch。通过每 300 秒从 AWS/Billing 命名空间查询,将产生额外成本。

示例 3

编辑

根据 AWS 账户中的配置和服务数量,API 调用的数量可能会变得太大,导致 API 成本很高。为了减少 API 调用的数量,我们建议用户使用以下配置作为示例。

  • metrics.name: 仅收集对您的用例有用的指标的子列表。
  • metrics.statistic: 默认情况下,CloudWatch 指标集将进行 API 调用以获取所有统计数据,如平均值、最大值、最小值、总和等。如果用户知道哪个统计方法最有用,请在配置中指定它。
  • metrics.dimensions: 不同的 AWS 服务在其 CloudWatch 指标中报告不同的维度。例如,EMR 指标可以具有 JobFlowId 维度或 JobId 维度。如果用户知道哪个特定维度有用,则可以在此配置选项中指定它。
- module: aws
  period: 5m
  metricsets:
    - cloudwatch
  regions: us-east-1
  metrics:
    - namespace: AWS/ElasticMapReduce
      name: ["S3BytesWritten", "S3BytesRead", "HDFSUtilization", "TotalLoad"]
      resource_type: elasticmapreduce
      statistic: ["Average"]
      dimensions:
        - name: JobId
          value: "*"

更多示例

编辑

使用以下配置,用户将能够从 EBS、ELB 和 EC2 收集 CloudWatch 指标,而无需标签信息。

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/EBS
    - namespace: AWS/ELB
    - namespace: AWS/EC2

使用以下配置,用户将能够从 EBS、ELB 和 EC2 收集 CloudWatch 指标,并包含这些服务的标签。

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/EBS
      resource_type: ebs
    - namespace: AWS/ELB
      resource_type: elasticloadbalancing
    - namespace: AWS/EC2
      resource_type: ec2:instance

使用以下配置,用户将能够收集特定的 CloudWatch 指标。例如,来自 EC2 实例 i-123 的 CPUUtilization 指标(平均值)和来自 EC2 实例 i-456 的 NetworkIn 指标(平均值)。

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/EC2
      name: ["CPUUtilization"]
      resource_type: ec2:instance
      dimensions:
        - name: InstanceId
          value: i-123
      statistic: ["Average"]
    - namespace: AWS/EC2
      name: ["NetworkIn"]
      dimensions:
        - name: InstanceId
          value: i-456
      statistic: ["Average"]

使用以下配置,用户可以仅过滤出维度为 LoadBalacerTargetGroup 且指标名称为 UnHealthyHostCount 的指标,LoadBalacerTargetGroup 的值可以是任意值。

- module: aws
  period: 300s
  metricsets:
    - cloudwatch
  metrics:
    - namespace: AWS/ApplicationELB
      statistic: ['Maximum']
      name: ['UnHealthyHostCount']
      dimensions:
        - name: LoadBalancer
          value: "*"
        - name: TargetGroup
          value: "*"
      resource_type: elasticloadbalancing

这是一个默认指标集。如果未配置主机模块,则默认启用此指标集。

字段

有关指标集中每个字段的描述,请参阅导出的字段部分。

以下是由此指标集生成的示例文档

{
    "@timestamp": "2017-10-12T08:05:34.853Z",
    "aws": {
        "cloudwatch": {
            "namespace": "AWS/RDS"
        },
        "dimensions": {
            "DBClusterIdentifier": "database-1",
            "Role": "READER"
        },
        "rds": {
            "metrics": {
                "AbortedClients": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "ActiveTransactions": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "AuroraBinlogReplicaLag": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "AuroraReplicaLag": {
                    "avg": 18.4158,
                    "count": 5,
                    "max": 23.787,
                    "min": 10.634,
                    "sum": 92.07900000000001
                },
                "AuroraVolumeBytesLeftTotal": {
                    "avg": 70007366615040,
                    "count": 5,
                    "max": 70007366615040,
                    "min": 70007366615040,
                    "sum": 350036833075200
                },
                "Aurora_pq_request_attempted": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_executed": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_failed": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_in_progress": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen_below_min_rows": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen_few_pages_outside_buffer_pool": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen_long_trx": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen_pq_high_buffer_pool_pct": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen_small_table": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_not_chosen_unsupported_access": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Aurora_pq_request_throttled": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "BlockedTransactions": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "BufferCacheHitRatio": {
                    "avg": 100,
                    "count": 5,
                    "max": 100,
                    "min": 100,
                    "sum": 500
                },
                "CPUUtilization": {
                    "avg": 6.051666111792592,
                    "count": 5,
                    "max": 6.216563057282379,
                    "min": 5.808333333333334,
                    "sum": 30.25833055896296
                },
                "CommitLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "CommitThroughput": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "ConnectionAttempts": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DDLLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DDLThroughput": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DMLLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DMLThroughput": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DatabaseConnections": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Deadlocks": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DeleteLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "DeleteThroughput": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "EBSByteBalance%": {
                    "avg": 99,
                    "count": 1,
                    "max": 99,
                    "min": 99,
                    "sum": 99
                },
                "EBSIOBalance%": {
                    "avg": 99,
                    "count": 1,
                    "max": 99,
                    "min": 99,
                    "sum": 99
                },
                "EngineUptime": {
                    "avg": 20800826,
                    "count": 5,
                    "max": 20800946,
                    "min": 20800706,
                    "sum": 104004130
                },
                "FreeLocalStorage": {
                    "avg": 29682751078.4,
                    "count": 5,
                    "max": 29682819072,
                    "min": 29682675712,
                    "sum": 148413755392
                },
                "FreeableMemory": {
                    "avg": 4639068160,
                    "count": 5,
                    "max": 4639838208,
                    "min": 4638638080,
                    "sum": 23195340800
                },
                "InsertLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "InsertThroughput": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "LoginFailures": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "NetworkReceiveThroughput": {
                    "avg": 0.8399323667305664,
                    "count": 5,
                    "max": 1.399556807011113,
                    "min": 0.6999533364442371,
                    "sum": 4.199661833652832
                },
                "NetworkThroughput": {
                    "avg": 1.6798647334611327,
                    "count": 5,
                    "max": 2.799113614022226,
                    "min": 1.3999066728884741,
                    "sum": 8.399323667305664
                },
                "NetworkTransmitThroughput": {
                    "avg": 0.8399323667305664,
                    "count": 5,
                    "max": 1.399556807011113,
                    "min": 0.6999533364442371,
                    "sum": 4.199661833652832
                },
                "NumBinaryLogFiles": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "Queries": {
                    "avg": 6.3836833181909265,
                    "count": 5,
                    "max": 6.53289780681288,
                    "min": 6.184260972479205,
                    "sum": 31.91841659095463
                },
                "ReadLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "ResultSetCacheHitRatio": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "RollbackSegmentHistoryListLength": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "RowLockTime": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "SelectLatency": {
                    "avg": 0.2519199153394592,
                    "count": 5,
                    "max": 0.2609050632911392,
                    "min": 0.24367924528301885,
                    "sum": 1.2595995766972958
                },
                "SelectThroughput": {
                    "avg": 2.6002296989354514,
                    "count": 5,
                    "max": 2.650618477644784,
                    "min": 2.5335866920025336,
                    "sum": 13.001148494677256
                },
                "SumBinaryLogSize": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "UpdateLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "UpdateThroughput": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                },
                "WriteLatency": {
                    "avg": 0,
                    "count": 5,
                    "max": 0,
                    "min": 0,
                    "sum": 0
                }
            }
        }
    },
    "cloud": {
        "account": {
            "id": "428152502467",
            "name": "elastic-beats"
        },
        "provider": "aws",
        "region": "eu-west-1"
    },
    "event": {
        "dataset": "aws.cloudwatch",
        "duration": 115000,
        "module": "aws"
    },
    "metricset": {
        "name": "cloudwatch",
        "period": 10000
    },
    "service": {
        "type": "aws"
    }
}