Kubernetes 模块

编辑

作为 Kubernetes 监控的主要组成部分之一,此模块能够从多个组件获取指标

一些之前的组件在每个 Kubernetes 节点上运行(例如 kubeletproxy),而其他组件则提供一个集群范围的端点。这对于确定模块中包含的不同指标集的最佳配置和运行策略非常重要。

有关如何将此模块配置并在 Kubernetes 上作为 DaemonSetDeployment 的一部分运行的完整参考,在 Kubernetes 上运行 Metricbeat 文档中提供了一个完整的示例清单。

Kubernetes 端点和指标集

编辑

Kubernetes 模块有点复杂,因为它内部的指标集需要访问各种各样的端点。

本节重点介绍并引入一些具有类似端点访问需求的指标集组。有关指标集的更多详细信息,请参阅下面的 configuration examplemetricsets 部分。

container / node / pod / system / volume

编辑

默认指标集 containernodepodsystemvolume 需要访问每个 Kubernetes 节点上的 kubelet 端点,因此建议将它们作为 Metricbeat DaemonSet 或在主机上运行的独立 Metricbeat 的一部分包含在内。

根据 Kubernetes 节点的版本和配置,kubelet 可能会提供一个只读的 http 端口(通常为 10255),这在某些配置示例中使用。但一般来说,最近,此端点需要 SSL (https) 访问(默认端口为 10250)和基于令牌的身份验证。

state_* 和 event

编辑

所有具有 state_ 前缀的指标集都需要 hosts 字段指向集群内的 kube-state-metrics 服务。由于该服务提供集群范围的指标,因此无需按节点获取它们,因此建议将这些指标集作为只有一个副本的 Metricbeat Deployment 的一部分运行。

注意:Kube-state-metrics 默认未在 Kubernetes 中部署。对于这些情况,其部署说明在此处提供。通常,kube-state-metrics 运行一个 Deployment,并且可以通过 kube-system 命名空间上的一个名为 kube-state-metrics 的服务访问,这将是在我们的配置中使用的服务。

apiserver

编辑

apiserver 指标集需要访问 Kubernetes API,这应该在所有 Kubernetes 环境中都容易获得。根据 Kubernetes 配置,API 访问可能需要 SSL (https) 和基于令牌的身份验证。

为了访问 API 服务的 /metrics 路径,一些 Kubernetes 环境可能需要将以下权限添加到 ClusterRole。

rules:
- nonResourceURLs:
  - /metrics
  verbs:
  - get

proxy

编辑

proxy 指标集需要访问每个 Kubernetes 节点上的代理端点,因此建议将其配置为 Metricbeat DaemonSet 的一部分。

scheduler 和 controllermanager

编辑

这些指标集需要访问 Kubernetes controller-managerscheduler 端点。默认情况下,这些 pod 仅在主节点上运行,并且它们不会通过服务公开,但有不同的策略可用于其配置

  • 创建 Kubernetes 服务,使 kube-controller-managerkube-scheduler 可用,并将指标集配置为指向这些服务,作为 Metricbeat Deployment 的一部分。
  • 使用 Autodiscovery 功能作为 Metricbeat DaemonSet 的一部分,并将指标集包含在应用于特定 pod 的条件模板中。

注意:在某些“作为服务”的 Kubernetes 实现中,例如 GKE,主节点甚至在主节点上运行的 pod 都不可见。在这种情况下,无法使用 schedulercontrollermanager 指标集。

Kubernetes RBAC

编辑

Metricbeat 需要某些集群级别的特权才能获取指标。以下示例创建了一个名为 metricbeatServiceAcount,其中具有运行模块中所有指标集所需的权限。为此目的创建了 ClusterRoleClusterRoleBinding

apiVersion: v1
kind: ServiceAccount
metadata:
  name: metricbeat
  namespace: kube-system
  labels:
    k8s-app: metricbeat
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: metricbeat
  labels:
    k8s-app: metricbeat
rules:
- apiGroups: [""]
  resources:
  - nodes
  - namespaces
  - events
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources:
  - jobs
  verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - replicasets
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  - deployments
  - replicasets
  verbs: ["get", "list", "watch"]
- apiGroups:
  - ""
  resources:
  - nodes/stats
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: metricbeat
subjects:
- kind: ServiceAccount
  name: metricbeat
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: metricbeat
  apiGroup: rbac.authorization.k8s.io

兼容性

编辑

Kubernetes 模块已通过以下 Kubernetes 版本测试:1.28.x、1.29.x、1.30.x 和 1.31.x

仪表板

编辑

Kubernetes 模块附带了 集群概览apiservercontrollermanagerschedulerproxy 的默认仪表板。

如果您为这些组件使用 HA,请注意,从所有实例收集数据时,仪表板通常会显示指标的平均值。对于这些情况,可以按主机或服务地址进行过滤。

controllermanagerschedulerproxy 的仪表板与低于 7.2.0 的 kibana 版本不兼容

集群概览仪表板中的集群选择器有助于区分和过滤从多个集群收集的指标。如果要专注于监视特定场景的 Kubernetes 集群子集,则此集群选择器可能是一个方便的工具。请注意,此选择器从可能并不总是可用的 orchestrator.cluster.name 字段填充。此字段的值来自 kube_configkubeadm-config configMap 和 GKE 的 Google Cloud 元 API 等来源。如果上述来源未提供此值,则 metricbeat 将不会报告它。但是,您始终可以使用 add_fields 处理器来设置 orchestrator.cluster.name 字段并在 集群概览仪表板中使用它

processors:
  - add_fields:
      target: orchestrator.cluster
      fields:
        name: clusterName
        url: clusterURL

Kubernetes 集群概览示例

metricbeat kubernetes clusteroverview

如果将收集周期设置为大于 2m 的值,则需要增加“所需 Pod”、“可用 Pod”和“不可用 Pod”可视化的间隔(在面板选项中)。

Kubernetes 控制器管理器示例

metricbeat kubernetes controllermanager

Kubernetes 调度器示例

metricbeat kubernetes scheduler

Kubernetes 代理示例

metricbeat kubernetes proxy

示例配置

Kubernetes 模块支持 模块 中描述的标准配置选项。这是一个示例配置

metricbeat.modules:
# Node metrics, from kubelet:
- module: kubernetes
  metricsets:
    - container
    - node
    - pod
    - system
    - volume
  period: 10s
  enabled: true
  hosts: ["https://${NODE_NAME}:10250"]
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  ssl.verification_mode: "none"
  #ssl.certificate_authorities:
  #  - /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
  #ssl.certificate: "/etc/pki/client/cert.pem"
  #ssl.key: "/etc/pki/client/cert.key"

  # Enriching parameters:
  add_metadata: true
  # If kube_config is not set, KUBECONFIG environment variable will be checked
  # and if not present it will fall back to InCluster
  #kube_config: ~/.kube/config
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true
  #include_labels: []
  #exclude_labels: []
  #include_annotations: []
  #labels.dedot: true
  #annotations.dedot: true

  # When used outside the cluster:
  #node: node_name

  # To configure additionally node and namespace metadata `add_resource_metadata` can be defined.
  # By default all labels will be included while annotations are not added by default.
  # add_resource_metadata:
  #   namespace:
  #     include_labels: ["namespacelabel1"]
  #   node:
  #     include_labels: ["nodelabel2"]
  #     include_annotations: ["nodeannotation1"]
  #   deployment: false
  #   cronjob: false
  # Kubernetes client QPS and burst can be configured additionally
  #kube_client_options:
  #  qps: 5
  #  burst: 10

# State metrics from kube-state-metrics service:
- module: kubernetes
  enabled: true
  metricsets:
    - state_node
    - state_daemonset
    - state_deployment
    - state_replicaset
    - state_statefulset
    - state_pod
    - state_container
    - state_job
    - state_cronjob
    - state_resourcequota
    - state_service
    - state_persistentvolume
    - state_persistentvolumeclaim
    - state_storageclass
    # Uncomment this to get k8s events:
    #- event  period: 10s
  hosts: ["kube-state-metrics:8080"]

  # Enriching parameters:
  add_metadata: true
  # If kube_config is not set, KUBECONFIG environment variable will be checked
  # and if not present it will fall back to InCluster
  #kube_config: ~/.kube/config
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true
  #include_labels: []
  #exclude_labels: []
  #include_annotations: []
  #labels.dedot: true
  #annotations.dedot: true

  # When used outside the cluster:
  #node: node_name

  # Set the namespace to watch for resources
  #namespace: staging

  # To configure additionally node and namespace metadata `add_resource_metadata` can be defined.
  # By default all labels will be included while annotations are not added by default.
  # add_resource_metadata:
  #   namespace:
  #     include_labels: ["namespacelabel1"]
  #   node:
  #     include_labels: ["nodelabel2"]
  #     include_annotations: ["nodeannotation1"]
  #   deployment: false
  #   cronjob: false
  # Kubernetes client QPS and burst can be configured additionally
  #kube_client_options:
  #  qps: 5
  #  burst: 10

# Kubernetes Events
- module: kubernetes
  enabled: true
  metricsets:
    - event
  period: 10s
  # Skip events older than Metricbeat's statup time is enabled by default.
  # Setting to false the skip_older setting will stop filtering older events.
  # This setting is also useful went Event's timestamps are not populated properly.
  #skip_older: false
  # If kube_config is not set, KUBECONFIG environment variable will be checked
  # and if not present it will fall back to InCluster
  #kube_config: ~/.kube/config
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true
  # Set the namespace to watch for events
  #namespace: staging
  # Set the sync period of the watchers
  #sync_period: 10m
  # Kubernetes client QPS and burst can be configured additionally
  #kube_client_options:
  #  qps: 5
  #  burst: 10

# Kubernetes API server
# (when running metricbeat as a deployment)
- module: kubernetes
  enabled: true
  metricsets:
    - apiserver
  hosts: ["https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"]
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  ssl.certificate_authorities:
    - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  period: 30s
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true

# Kubernetes proxy server
# (when running metricbeat locally at hosts or as a daemonset + host network)
- module: kubernetes
  enabled: true
  metricsets:
    - proxy
  hosts: ["localhost:10249"]
  period: 10s
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true

# Kubernetes controller manager
# (URL and deployment method should be adapted to match the controller manager deployment / service / endpoint)
- module: kubernetes
  enabled: true
  metricsets:
    - controllermanager
  hosts: ["https://127.0.0.1:10252"]
  period: 10s
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true

# Kubernetes scheduler
# (URL and deployment method should be adapted to match scheduler deployment / service / endpoint)
- module: kubernetes
  enabled: true
  metricsets:
    - scheduler
  hosts: ["localhost:10251"]
  period: 10s
  #By default requests to kubeadm config map are made in order to enrich cluster name by requesting /api/v1/namespaces/kube-system/configmaps/kubeadm-config API endpoint.
  use_kubeadm: true

SSL 中所述,当使用 ssl 配置字段时,此模块支持 TLS 连接。它还支持 标准 HTTP 配置选项中描述的选项。

指标集

以下指标集可用