› ›

Apache Spark 集成

版本	1.3.0 (查看全部)
兼容的 Kibana 版本	8.13.0 或更高版本
支持的 Serverless 项目类型这是什么？	安全性可观测性
订阅级别这是什么？	基础
支持级别这是什么？	Elastic

概述

编辑

Apache Spark 是一个开源的分布式计算系统，它提供了一个快速通用的集群计算框架。它提供内存数据处理能力，显著提高了大数据分析应用程序的性能。Spark 支持多种编程语言，包括 Scala、Python、Java 和 R，并带有用于 SQL、流处理、机器学习和图处理的内置模块。这使其成为各种数据处理和分析任务的多功能工具。

使用 Apache Spark 集成来

收集与应用程序、驱动程序、执行器和节点相关的指标。
创建可视化效果来监视、度量和分析使用趋势和关键数据，从而获得业务洞察。
创建警报，通过在排除问题时引用相关日志来减少 MTTD 和 MTTR。

数据流

编辑

Apache Spark 集成收集指标数据。

指标提供了对 Apache Spark 统计信息的洞察。Metric Apache Spark 集成收集的数据流包括 application、driver、executor 和 node，允许用户监控和排除其 Apache Spark 实例的性能问题。

数据流

application：收集与使用的核心数量、应用程序名称、以毫秒为单位的运行时和应用程序的当前状态相关的信息。
driver：收集与驱动程序详细信息、作业持续时间、任务执行、内存使用情况、执行程序状态和 JVM 指标相关的信息。
executor：收集与操作、内存使用情况、垃圾收集、文件处理和线程池活动相关的信息。
node：收集与应用程序计数、等待的应用程序、工作节点指标、执行程序计数、核心使用率和内存使用情况相关的信息。

注意

用户可以在 Discover 中 metrics-* 索引模式下监控和查看 Apache Spark 摄取文档内的指标。

兼容性

编辑

此集成已针对 Apache Spark 版本 3.5.0 进行测试。

要求

编辑

您需要 Elasticsearch 用于存储和搜索您的数据，以及 Kibana 用于可视化和管理您的数据。您可以使用我们托管在 Elastic Cloud 上的 Elasticsearch 服务（推荐），也可以在您自己的硬件上自行管理 Elastic Stack。

为了从 Apache Spark 摄取数据，您必须知道主节点和工作节点的完整主机。

要继续进行 Jolokia 设置，应将 Apache Spark 安装为独立设置。请确保 spark 文件夹安装在 /usr/local 路径中。如果不是，请在后续步骤中指定 spark 文件夹的路径。您可以从 Apache Spark 的官方下载页面安装独立设置。

为了收集 Spark 统计信息，我们需要下载并启用 Jolokia JVM Agent。

cd /usr/share/java/
wget -O jolokia-agent.jar https://search.maven.org/remotecontent?filepath=org/jolokia/jolokia-jvm/1.3.6/jolokia-jvm-1.3.6-agent.jar

到目前为止，Jolokia JVM Agent 已下载，我们应该配置 Apache Spark 以将其用作 JavaAgent 并通过 HTTP/JSON 公开指标。编辑 spark-env.sh。它应该在 /usr/local/spark/conf 中，并添加以下参数（假设 spark 安装文件夹是 /usr/local/spark，如果不是，请将路径更改为安装 Spark 的路径）

export SPARK_MASTER_OPTS="$SPARK_MASTER_OPTS -javaagent:/usr/share/java/jolokia-agent.jar=config=/usr/local/spark/conf/jolokia-master.properties"

现在，创建 /usr/local/spark/conf/jolokia-master.properties 文件，内容如下

host=0.0.0.0
port=7777
agentContext=/jolokia
backlog=100

policyLocation=file:///usr/local/spark/conf/jolokia.policy
historyMaxEntries=10
debug=false
debugMaxEntries=100
maxDepth=15
maxCollectionSize=1000
maxObjects=0

现在我们需要创建 /usr/local/spark/conf/jolokia.policy，内容如下

<?xml version="1.0" encoding="utf-8"?>
<restrict>
  <http>
    <method>get</method>
    <method>post</method>
  </http>
  <commands>
    <command>read</command>
  </commands>
</restrict>

使用以下内容在 conf/bigdata.ini 文件中配置 Agent

[Spark-Master]
stats: https://127.0.0.1:7777/jolokia/read

重启 Spark master。

对 Spark Worker、Driver 和 Executor 执行相同的步骤。

设置

编辑

有关如何设置集成的分步说明，请参阅入门指南。

验证

编辑

成功配置集成后，单击 Apache Spark 集成的 Assets 选项卡以显示可用的仪表板。选择为您的配置数据流的仪表板，该仪表板应填充所需的数据。

故障排除

编辑

如果 host.ip 在 metrics-* 数据视图下出现冲突，可以通过重新索引 Application、Driver、Executor 和 Node 数据流来解决此问题。

指标

编辑

应用程序

编辑

application 数据流收集与使用的核心数、应用程序名称、以毫秒为单位的运行时以及应用程序的当前状态相关的指标。

示例

以下是 application 的示例事件

{
    "@timestamp": "2023-09-28T09:24:33.812Z",
    "agent": {
        "ephemeral_id": "20d060ec-da41-4f14-a187-d020b9fbec7d",
        "id": "a6bdbb4a-4bac-4243-83cb-dba157f24987",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.8.0"
    },
    "apache_spark": {
        "application": {
            "cores": 8,
            "mbean": "metrics:name=application.PythonWordCount.1695893057562.cores,type=gauges",
            "name": "PythonWordCount.1695893057562"
        }
    },
    "data_stream": {
        "dataset": "apache_spark.application",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.11.0"
    },
    "elastic_agent": {
        "id": "a6bdbb4a-4bac-4243-83cb-dba157f24987",
        "snapshot": false,
        "version": "8.8.0"
    },
    "event": {
        "agent_id_status": "verified",
        "dataset": "apache_spark.application",
        "duration": 23828342,
        "ingested": "2023-09-28T09:24:37Z",
        "kind": "metric",
        "module": "apache_spark",
        "type": [
            "info"
        ]
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "id": "e8978f2086c14e13b7a0af9ed0011d19",
        "ip": [
            "172.20.0.7"
        ],
        "mac": [
            "02-42-C0-A8-F5-07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.90.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.6 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "jmx",
        "period": 60000
    },
    "service": {
        "address": "https://apache-spark-main:7777/jolokia/%3FignoreErrors=true&canonicalNaming=false",
        "type": "jolokia"
    }
}

ECS 字段参考

有关 ECS 字段的详细信息，请参阅以下文档。

导出的字段

字段	描述	类型	指标类型
@timestamp	事件时间戳。	date
agent.id	此代理的唯一标识符（如果存在）。示例：对于 Beats，这将是 beat.id。	keyword
apache_spark.application.cores	核心数。	long	gauge
apache_spark.application.mbean	jolokia mbean 的名称。	keyword
apache_spark.application.name	应用程序的名称。	keyword
apache_spark.application.runtime.ms	运行应用程序所花费的时间 (ms)。	long	gauge
apache_spark.application.status	应用程序的当前状态。	keyword
cloud.account.id	用于标识多租户环境中不同实体的云帐户或组织 ID。示例：AWS 帐户 ID、Google Cloud ORG ID 或其他唯一标识符。	keyword
cloud.availability_zone	此主机、资源或服务所在的可用区。	keyword
cloud.instance.id	主机机器的实例 ID。	keyword
cloud.provider	云提供商的名称。示例值为 aws、azure、gcp 或 digitalocean。	keyword
cloud.region	此主机、资源或服务所在的区域。	keyword
container.id	唯一容器 ID。	keyword
data_stream.dataset	数据流数据集。	constant_keyword
data_stream.namespace	数据流命名空间。	constant_keyword
data_stream.type	数据流类型。	constant_keyword
host.name	主机的名称。它可以包含 Unix 系统上返回的主机名、完全限定域名 (FQDN) 或用户指定的名称。建议值为主机的小写 FQDN。	keyword
service.address	从中收集此服务数据的地址。这应该是一个 URI、网络地址（ipv4:port 或 [ipv6]:port）或资源路径（套接字）。	keyword

驱动程序

编辑

driver 数据流收集与驱动程序详细信息、作业持续时间、任务执行、内存使用情况、执行程序状态和 JVM 指标相关的指标。

示例

以下是 driver 的示例事件

{
    "@timestamp": "2023-09-29T12:04:40.050Z",
    "agent": {
        "ephemeral_id": "e3534e18-b92f-4b1b-bd39-43ff9c8849d4",
        "id": "a76f5e50-2a98-4b96-80f6-026ad822e3e8",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.8.0"
    },
    "apache_spark": {
        "driver": {
            "application_name": "app-20230929120427-0000",
            "jvm": {
                "cpu": {
                    "time": 25730000000
                }
            },
            "mbean": "metrics:name=app-20230929120427-0000.driver.JVMCPU.jvmCpuTime,type=gauges"
        }
    },
    "data_stream": {
        "dataset": "apache_spark.driver",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.11.0"
    },
    "elastic_agent": {
        "id": "a76f5e50-2a98-4b96-80f6-026ad822e3e8",
        "snapshot": false,
        "version": "8.8.0"
    },
    "event": {
        "agent_id_status": "verified",
        "dataset": "apache_spark.driver",
        "duration": 177706950,
        "ingested": "2023-09-29T12:04:41Z",
        "kind": "metric",
        "module": "apache_spark",
        "type": [
            "info"
        ]
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "id": "e8978f2086c14e13b7a0af9ed0011d19",
        "ip": [
            "172.26.0.7"
        ],
        "mac": [
            "02-42-AC-1A-00-07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.90.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.6 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "jmx",
        "period": 60000
    },
    "service": {
        "address": "https://apache-spark-main:7779/jolokia/%3FignoreErrors=true&canonicalNaming=false",
        "type": "jolokia"
    }
}

ECS 字段参考

有关 ECS 字段的详细信息，请参阅以下文档。

导出的字段

字段	描述	类型	指标类型
@timestamp	事件时间戳。	date
agent.id	此代理的唯一标识符（如果存在）。示例：对于 Beats，这将是 beat.id。	keyword
apache_spark.driver.application_name	应用程序的名称。	keyword
apache_spark.driver.dag_scheduler.job.active	活动作业的数量。	long	gauge
apache_spark.driver.dag_scheduler.job.all	作业总数。	long	gauge
apache_spark.driver.dag_scheduler.stages.failed	失败阶段的数量。	long	gauge
apache_spark.driver.dag_scheduler.stages.running	正在运行的阶段的数量。	long	gauge
apache_spark.driver.dag_scheduler.stages.waiting	等待阶段的数量	long	gauge
apache_spark.driver.disk.space_used	已使用的磁盘空间量，以 MB 为单位。	long	gauge
apache_spark.driver.executor_metrics.gc.major.count	主要的 GC 总数。例如，垃圾收集器是 MarkSweepCompact、PS MarkSweep、ConcurrentMarkSweep、G1 Old Generation 等之一。	long	gauge
apache_spark.driver.executor_metrics.gc.major.time	经过的主要 GC 总时间。该值以毫秒为单位表示。	long	gauge
apache_spark.driver.executor_metrics.gc.minor.count	次要的 GC 总数。例如，垃圾收集器是 Copy、PS Scavenge、ParNew、G1 Young Generation 等之一。	long	gauge
apache_spark.driver.executor_metrics.gc.minor.time	经过的次要 GC 总时间。该值以毫秒为单位表示。	long	gauge
apache_spark.driver.executor_metrics.heap_memory.off.execution	正在使用的堆外执行内存峰值，以字节为单位。	long	gauge
apache_spark.driver.executor_metrics.heap_memory.off.storage	正在使用的堆外存储内存峰值，以字节为单位。	long	gauge
apache_spark.driver.executor_metrics.heap_memory.off.unified	堆外内存峰值（执行和存储）。	long	gauge
apache_spark.driver.executor_metrics.heap_memory.on.execution	正在使用的堆内执行内存峰值，以字节为单位。	long	gauge
apache_spark.driver.executor_metrics.heap_memory.on.storage	正在使用的堆内存储内存峰值，以字节为单位。	long	gauge
apache_spark.driver.executor_metrics.heap_memory.on.unified	堆内内存峰值（执行和存储）。	long	gauge
apache_spark.driver.executor_metrics.memory.direct_pool	JVM 用于直接缓冲区池的峰值内存。	long	gauge
apache_spark.driver.executor_metrics.memory.jvm.heap	用于对象分配的堆的峰值内存使用率。	long	counter
apache_spark.driver.executor_metrics.memory.jvm.off_heap	Java 虚拟机使用的非堆内存的峰值内存使用率。	long	counter
apache_spark.driver.executor_metrics.memory.mapped_pool	JVM 用于映射缓冲区池的峰值内存	long	gauge
apache_spark.driver.executor_metrics.process_tree.jvm.rss_memory	常驻集大小：进程在实际内存中的页数。这只是计入文本、数据或堆栈空间的页数。这不包括尚未按需加载的页，或已换出的页。	long	gauge
apache_spark.driver.executor_metrics.process_tree.jvm.v_memory	以字节为单位的虚拟内存大小。	long	gauge
apache_spark.driver.executor_metrics.process_tree.other.rss_memory		long	gauge
apache_spark.driver.executor_metrics.process_tree.other.v_memory		long	gauge
apache_spark.driver.executor_metrics.process_tree.python.rss_memory		long	gauge
apache_spark.driver.executor_metrics.process_tree.python.v_memory		long	gauge
apache_spark.driver.executors.all	执行程序总数。	long	gauge
apache_spark.driver.executors.decommission_unfinished	已停用未完成的执行程序总数。	long	counter
apache_spark.driver.executors.exited_unexpectedly	意外退出的执行程序总数。	long	counter
apache_spark.driver.executors.gracefully_decommissioned	正常停用的执行程序总数。	long	counter
apache_spark.driver.executors.killed_by_driver	由驱动程序杀死的执行程序总数。	long	counter
apache_spark.driver.executors.max_needed	所需的最大执行程序数。	long	gauge
apache_spark.driver.executors.pending_to_remove	待删除的执行程序总数。	long	gauge
apache_spark.driver.executors.target	目标执行程序总数。	long	gauge
apache_spark.driver.executors.to_add	要添加的执行程序总数。	long	gauge
apache_spark.driver.hive_external_catalog.file_cache_hits	文件缓存命中总数。	long	counter
apache_spark.driver.hive_external_catalog.files_discovered	发现的文件总数。	long	counter
apache_spark.driver.hive_external_catalog.hive_client_calls	Hive 客户端调用总数。	long	counter
apache_spark.driver.hive_external_catalog.parallel_listing_job.count	并行运行的作业数。	long	counter
apache_spark.driver.hive_external_catalog.partitions_fetched	提取的分区数。	long	counter
apache_spark.driver.job_duration	作业的持续时间。	long	gauge
apache_spark.driver.jobs.failed	失败作业的数量。	long	counter
apache_spark.driver.jobs.succeeded	成功作业的数量。	long	counter
apache_spark.driver.jvm.cpu.time	JVM 消耗的 CPU 时间。	long	gauge
apache_spark.driver.mbean	jolokia mbean 的名称。	keyword
apache_spark.driver.memory.max_mem	可用于存储的最大内存量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.off_heap.max	可用的最大堆外内存量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.off_heap.remaining	剩余的堆外内存量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.off_heap.used	已使用的堆外内存总量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.on_heap.max	可用的最大堆内内存量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.on_heap.remaining	剩余的堆内内存量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.on_heap.used	已使用的堆内内存总量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.remaining	剩余的存储内存量，以 MB 为单位。	long	gauge
apache_spark.driver.memory.used	用于存储的内存总量，以 MB 为单位。	long	gauge
apache_spark.driver.spark.streaming.event_time.watermark		long	gauge
apache_spark.driver.spark.streaming.input_rate.total	输入的总速率。	double	gauge
apache_spark.driver.spark.streaming.latency		long	gauge
apache_spark.driver.spark.streaming.processing_rate.total	处理的总速率。	double	gauge
apache_spark.driver.spark.streaming.states.rows.total	总行数。	long	gauge
apache_spark.driver.spark.streaming.states.used_bytes	已利用的总字节数。	long	gauge
apache_spark.driver.stages.completed_count	已完成阶段的总数。	long	counter
apache_spark.driver.stages.failed_count	失败阶段的总数。	long	counter
apache_spark.driver.stages.skipped_count	跳过阶段的总数。	long	counter
apache_spark.driver.tasks.completed	已完成的任务数。	long	counter
apache_spark.driver.tasks.executors.black_listed	任务的黑名单执行器的数量。	long	counter
apache_spark.driver.tasks.executors.excluded	任务的排除执行器的数量。	long	counter
apache_spark.driver.tasks.executors.unblack_listed	任务的解除黑名单执行器的数量。	long	counter
apache_spark.driver.tasks.executors.unexcluded	任务的解除排除执行器的数量。	long	counter
apache_spark.driver.tasks.failed	失败的任务数。	long	counter
apache_spark.driver.tasks.killed	被终止的任务数。	long	counter
apache_spark.driver.tasks.skipped	跳过的任务数。	long	counter
cloud.account.id	用于标识多租户环境中不同实体的云帐户或组织 ID。示例：AWS 帐户 ID、Google Cloud ORG ID 或其他唯一标识符。	keyword
cloud.availability_zone	此主机、资源或服务所在的可用区。	keyword
cloud.instance.id	主机机器的实例 ID。	keyword
cloud.provider	云提供商的名称。示例值为 aws、azure、gcp 或 digitalocean。	keyword
cloud.region	此主机、资源或服务所在的区域。	keyword
container.id	唯一容器 ID。	keyword
data_stream.dataset	数据流数据集。	constant_keyword
data_stream.namespace	数据流命名空间。	constant_keyword
data_stream.type	数据流类型。	constant_keyword
host.name	主机的名称。它可以包含 Unix 系统上返回的主机名、完全限定域名 (FQDN) 或用户指定的名称。建议值为主机的小写 FQDN。	keyword
service.address	从中收集此服务数据的地址。这应该是一个 URI、网络地址（ipv4:port 或 [ipv6]:port）或资源路径（套接字）。	keyword

执行器

编辑

executor 数据流收集与操作、内存使用、垃圾回收、文件处理和线程池活动相关的指标。

示例

关于 executor 的示例事件如下：

{
    "@timestamp": "2023-09-28T09:26:45.771Z",
    "agent": {
        "ephemeral_id": "3a3db920-eb4b-4045-b351-33526910ae8a",
        "id": "a6bdbb4a-4bac-4243-83cb-dba157f24987",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.8.0"
    },
    "apache_spark": {
        "executor": {
            "application_name": "app-20230928092630-0000",
            "id": "0",
            "jvm": {
                "cpu_time": 20010000000
            },
            "mbean": "metrics:name=app-20230928092630-0000.0.JVMCPU.jvmCpuTime,type=gauges"
        }
    },
    "data_stream": {
        "dataset": "apache_spark.executor",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.11.0"
    },
    "elastic_agent": {
        "id": "a6bdbb4a-4bac-4243-83cb-dba157f24987",
        "snapshot": false,
        "version": "8.8.0"
    },
    "event": {
        "agent_id_status": "verified",
        "dataset": "apache_spark.executor",
        "duration": 2849184715,
        "ingested": "2023-09-28T09:26:49Z",
        "kind": "metric",
        "module": "apache_spark",
        "type": [
            "info"
        ]
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "id": "e8978f2086c14e13b7a0af9ed0011d19",
        "ip": [
            "172.20.0.7"
        ],
        "mac": [
            "02-42-AC-14-00-07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "3.10.0-1160.90.1.el7.x86_64",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.6 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "jmx",
        "period": 60000
    },
    "service": {
        "address": "https://apache-spark-main:7780/jolokia/%3FignoreErrors=true&canonicalNaming=false",
        "type": "jolokia"
    }
}

ECS 字段参考

有关 ECS 字段的详细信息，请参阅以下文档。

导出的字段

字段	描述	类型	指标类型
@timestamp	事件时间戳。	date
agent.id	此代理的唯一标识符（如果存在）。示例：对于 Beats，这将是 beat.id。	keyword
apache_spark.executor.application_name	应用程序的名称。	keyword
apache_spark.executor.bytes.read	读取的总字节数。	long	counter
apache_spark.executor.bytes.written	写入的总字节数。	long	counter
apache_spark.executor.disk_bytes_spilled	溢出到磁盘的总字节数。	long	counter
apache_spark.executor.file_cache_hits	文件缓存命中总数。	long	counter
apache_spark.executor.files_discovered	发现的文件总数。	long	counter
apache_spark.executor.filesystem.file.large_read_ops	从文件中读取的大型读取操作总数。	long	gauge
apache_spark.executor.filesystem.file.read_bytes	从文件中读取的总字节数。	long	gauge
apache_spark.executor.filesystem.file.read_ops	从文件中读取的读取操作总数。	long	gauge
apache_spark.executor.filesystem.file.write_bytes	从文件中写入的总字节数。	long	gauge
apache_spark.executor.filesystem.file.write_ops	从文件中写入的写入操作总数。	long	gauge
apache_spark.executor.filesystem.hdfs.large_read_ops	从 HDFS 读取的大型读取操作总数。	long	gauge
apache_spark.executor.filesystem.hdfs.read_bytes	从 HDFS 读取的总字节数。	long	gauge
apache_spark.executor.filesystem.hdfs.read_ops	从 HDFS 读取的读取操作总数。	long	gauge
apache_spark.executor.filesystem.hdfs.write_bytes	从 HDFS 写入的总字节数。	long	gauge
apache_spark.executor.filesystem.hdfs.write_ops	从 HDFS 写入的写入操作总数。	long	gauge
apache_spark.executor.gc.major.count	主要的 GC 总数。例如，垃圾收集器是 MarkSweepCompact、PS MarkSweep、ConcurrentMarkSweep、G1 Old Generation 等之一。	long	gauge
apache_spark.executor.gc.major.time	经过的主要 GC 总时间。该值以毫秒为单位表示。	long	gauge
apache_spark.executor.gc.minor.count	次要的 GC 总数。例如，垃圾收集器是 Copy、PS Scavenge、ParNew、G1 Young Generation 等之一。	long	gauge
apache_spark.executor.gc.minor.time	经过的次要 GC 总时间。该值以毫秒为单位表示。	long	gauge
apache_spark.executor.heap_memory.off.execution	正在使用的堆外执行内存峰值，以字节为单位。	long	gauge
apache_spark.executor.heap_memory.off.storage	正在使用的堆外存储内存峰值，以字节为单位。	long	gauge
apache_spark.executor.heap_memory.off.unified	堆外内存峰值（执行和存储）。	long	gauge
apache_spark.executor.heap_memory.on.execution	正在使用的堆内执行内存峰值，以字节为单位。	long	gauge
apache_spark.executor.heap_memory.on.storage	正在使用的堆内存储内存峰值，以字节为单位。	long	gauge
apache_spark.executor.heap_memory.on.unified	堆内内存峰值（执行和存储）。	long	gauge
apache_spark.executor.hive_client_calls	Hive 客户端调用总数。	long	counter
apache_spark.executor.id	执行器的 ID。	keyword
apache_spark.executor.jvm.cpu_time	JVM 消耗的 CPU 时间。	long	gauge
apache_spark.executor.jvm.gc_time	执行此任务时，JVM 在垃圾回收中花费的已用时间。	long	counter
apache_spark.executor.mbean	jolokia mbean 的名称。	keyword
apache_spark.executor.memory.direct_pool	JVM 用于直接缓冲区池的峰值内存。	long	gauge
apache_spark.executor.memory.jvm.heap	用于对象分配的堆的峰值内存使用率。	long	gauge
apache_spark.executor.memory.jvm.off_heap	Java 虚拟机使用的非堆内存的峰值内存使用率。	long	gauge
apache_spark.executor.memory.mapped_pool	JVM 用于映射缓冲区池的峰值内存	long	gauge
apache_spark.executor.memory_bytes_spilled	此任务溢出的内存中字节数。	long	counter
apache_spark.executor.parallel_listing_job_count	并行运行的作业数。	long	counter
apache_spark.executor.partitions_fetched	提取的分区数。	long	counter
apache_spark.executor.process_tree.jvm.rss_memory	常驻集大小：进程在实际内存中的页数。这只是计入文本、数据或堆栈空间的页数。这不包括尚未按需加载的页，或已换出的页。	long	gauge
apache_spark.executor.process_tree.jvm.v_memory	以字节为单位的虚拟内存大小。	long	gauge
apache_spark.executor.process_tree.other.rss_memory	其他类型进程的驻留集大小。	long	gauge
apache_spark.executor.process_tree.other.v_memory	其他类型进程的虚拟内存大小，以字节为单位。	long	gauge
apache_spark.executor.process_tree.python.rss_memory	Python 的驻留集大小。	long	gauge
apache_spark.executor.process_tree.python.v_memory	Python 的虚拟内存大小，以字节为单位。	long	gauge
apache_spark.executor.records.read	读取的记录总数。	long	counter
apache_spark.executor.records.written	写入的记录总数。	long	counter
apache_spark.executor.result.serialization_time	序列化任务结果所花费的已用时间。该值以毫秒为单位表示。	long	counter
apache_spark.executor.result.size	此任务作为 TaskResult 传输回驱动程序的字节数。	long	counter
apache_spark.executor.run_time	运行此任务的已用时间	long	counter
apache_spark.executor.shuffle.bytes_written	在 shuffle 操作中写入的字节数。	long	counter
apache_spark.executor.shuffle.client.used.direct_memory	shuffle 客户端使用的直接内存量。	long	gauge
apache_spark.executor.shuffle.client.used.heap_memory	shuffle 客户端使用的堆内存量。	long	gauge
apache_spark.executor.shuffle.fetch_wait_time	任务等待远程 shuffle 块所花费的时间。	long	counter
apache_spark.executor.shuffle.local.blocks_fetched	在 shuffle 操作中提取的本地（而不是从远程执行器读取的）块的数量。	long	counter
apache_spark.executor.shuffle.local.bytes_read	从本地磁盘（而不是从远程执行器读取）在 shuffle 操作中读取的字节数。	long	counter
apache_spark.executor.shuffle.records.read	在 shuffle 操作中读取的记录数。	long	counter
apache_spark.executor.shuffle.records.written	在 shuffle 操作中写入的记录数。	long	counter
apache_spark.executor.shuffle.remote.blocks_fetched	在 shuffle 操作中提取的远程块的数量。	long	counter
apache_spark.executor.shuffle.remote.bytes_read	在 shuffle 操作中读取的远程字节数。	long	counter
apache_spark.executor.shuffle.remote.bytes_read_to_disk	在 shuffle 操作中读取到磁盘的远程字节数。大型块在 shuffle 读取操作中会被提取到磁盘，而不是像默认行为那样读取到内存中。	long	counter
apache_spark.executor.shuffle.server.used.direct_memory	shuffle 服务器使用的直接内存量。	long	gauge
apache_spark.executor.shuffle.server.used.heap_memory	shuffle 服务器使用的堆内存量。	long	counter
apache_spark.executor.shuffle.total.bytes_read	在 shuffle 操作中读取的字节数（包括本地和远程）。	long	counter
apache_spark.executor.shuffle.write.time	在阻塞写入磁盘或缓冲区缓存时所花费的时间。该值以纳秒为单位表示。	long	counter
apache_spark.executor.succeeded_tasks	成功完成的任务数。	long	counter
apache_spark.executor.threadpool.active_tasks	当前正在执行的任务数。	long	gauge
apache_spark.executor.threadpool.complete_tasks	在此执行器中完成的任务数。	long	gauge
apache_spark.executor.threadpool.current_pool_size	执行器的当前线程池大小。	long	gauge
apache_spark.executor.threadpool.max_pool_size	执行器的线程池的最大大小。	long	counter
apache_spark.executor.threadpool.started_tasks	在执行器的线程池中启动的任务数。	long	counter
cloud.account.id	用于标识多租户环境中不同实体的云帐户或组织 ID。示例：AWS 帐户 ID、Google Cloud ORG ID 或其他唯一标识符。	keyword
cloud.availability_zone	此主机、资源或服务所在的可用区。	keyword
cloud.instance.id	主机机器的实例 ID。	keyword
cloud.provider	云提供商的名称。示例值为 aws、azure、gcp 或 digitalocean。	keyword
cloud.region	此主机、资源或服务所在的区域。	keyword
container.id	唯一容器 ID。	keyword
data_stream.dataset	数据流数据集。	constant_keyword
data_stream.namespace	数据流命名空间。	constant_keyword
data_stream.type	数据流类型。	constant_keyword
host.name	主机的名称。它可以包含 Unix 系统上返回的主机名、完全限定域名 (FQDN) 或用户指定的名称。建议值为主机的小写 FQDN。	keyword
service.address	从中收集此服务数据的地址。这应该是一个 URI、网络地址（ipv4:port 或 [ipv6]:port）或资源路径（套接字）。	keyword

节点

编辑

node 数据流收集与应用程序计数、等待应用程序、工作器指标、执行器计数、核心使用和内存使用相关的指标。

示例

关于 node 的示例事件如下：

{
    "@timestamp": "2022-04-12T04:42:49.581Z",
    "agent": {
        "ephemeral_id": "ae57925e-eeca-4bf4-ae20-38f82db1378b",
        "id": "f051059f-86be-46d5-896d-ff1b2cdab179",
        "name": "docker-fleet-agent",
        "type": "metricbeat",
        "version": "8.1.0"
    },
    "apache_spark": {
        "node": {
            "main": {
                "applications": {
                    "count": 0,
                    "waiting": 0
                },
                "workers": {
                    "alive": 0,
                    "count": 0
                }
            }
        }
    },
    "data_stream": {
        "dataset": "apache_spark.node",
        "namespace": "ep",
        "type": "metrics"
    },
    "ecs": {
        "version": "8.11.0"
    },
    "elastic_agent": {
        "id": "f051059f-86be-46d5-896d-ff1b2cdab179",
        "snapshot": false,
        "version": "8.1.0"
    },
    "event": {
        "agent_id_status": "verified",
        "dataset": "apache_spark.node",
        "duration": 8321835,
        "ingested": "2022-04-12T04:42:53Z",
        "kind": "metric",
        "module": "apache_spark",
        "type": [
            "info"
        ]
    },
    "host": {
        "architecture": "x86_64",
        "containerized": true,
        "hostname": "docker-fleet-agent",
        "ip": [
            "192.168.32.5"
        ],
        "mac": [
            "02-42-AC-14-00-07"
        ],
        "name": "docker-fleet-agent",
        "os": {
            "codename": "focal",
            "family": "debian",
            "kernel": "5.4.0-107-generic",
            "name": "Ubuntu",
            "platform": "ubuntu",
            "type": "linux",
            "version": "20.04.3 LTS (Focal Fossa)"
        }
    },
    "metricset": {
        "name": "jmx",
        "period": 60000
    },
    "service": {
        "address": "https://apache-spark-main:7777/jolokia/%3FignoreErrors=true&canonicalNaming=false",
        "type": "jolokia"
    }
}

ECS 字段参考

有关 ECS 字段的详细信息，请参阅以下文档。

导出的字段

字段	描述	类型	指标类型
@timestamp	事件时间戳。	date
agent.id	此代理的唯一标识符（如果存在）。示例：对于 Beats，这将是 beat.id。	keyword
apache_spark.node.main.applications.count	应用程序的总数。	long	gauge
apache_spark.node.main.applications.waiting	正在等待的应用程序数。	long	gauge
apache_spark.node.main.workers.alive	存活的工作器数。	long	gauge
apache_spark.node.main.workers.count	工作器的总数。	long	gauge
apache_spark.node.worker.cores.free	空闲核心的数量。	long	gauge
apache_spark.node.worker.cores.used	已使用的核心数量。	long	gauge
apache_spark.node.worker.executors	执行器的数量。	long	gauge
apache_spark.node.worker.memory.free	空闲核心的数量。	long	gauge
apache_spark.node.worker.memory.used	已使用的内存量，以 MB 为单位。	long	gauge
cloud.account.id	用于标识多租户环境中不同实体的云帐户或组织 ID。示例：AWS 帐户 ID、Google Cloud ORG ID 或其他唯一标识符。	keyword
cloud.availability_zone	此主机、资源或服务所在的可用区。	keyword
cloud.instance.id	主机机器的实例 ID。	keyword
cloud.provider	云提供商的名称。示例值为 aws、azure、gcp 或 digitalocean。	keyword
cloud.region	此主机、资源或服务所在的区域。	keyword
container.id	唯一容器 ID。	keyword
data_stream.dataset	数据流数据集。	constant_keyword
data_stream.namespace	数据流命名空间。	constant_keyword
data_stream.type	数据流类型。	constant_keyword
host.name	主机的名称。它可以包含 Unix 系统上返回的主机名、完全限定域名 (FQDN) 或用户指定的名称。建议值为主机的小写 FQDN。	keyword
service.address	从中收集此服务数据的地址。这应该是一个 URI、网络地址（ipv4:port 或 [ipv6]:port）或资源路径（套接字）。	keyword

变更日志

编辑

变更日志

版本	详情	Kibana 版本
1.3.0	增强 (查看拉取请求) 添加对应用程序、驱动程序、执行器和节点数据流的处理器支持。	8.13.0 或更高版本
1.2.0	增强 (查看拉取请求) ECS 版本更新到 8.11.0。将 Kibana 约束更新到 ^8.13.0。修改了字段定义，以删除 ecs@mappings 组件模板中冗余的 ECS 字段。	8.13.0 或更高版本
1.1.0	增强 (查看拉取请求) 在 data_stream.dataset 上添加全局过滤器以提高性能。	8.8.0 或更高版本
1.0.3	增强 (查看拉取请求) 更新 README 以遵循文档指南。	8.8.0 或更高版本
1.0.2	增强 (查看拉取请求) 内联“通过引用”可视化	8.8.0 或更高版本
1.0.1	错误修复 (查看拉取请求) 将链接更新到正确的重新索引过程。	8.8.0 或更高版本
1.0.0	增强 (查看拉取请求) 使 Apache Spark GA。	8.8.0 或更高版本
0.8.0	增强 (查看拉取请求) 将包 format_version 更新到 3.0.0。	—
0.7.9	错误修复 (查看拉取请求) 在可视化中添加过滤器。	—
0.7.8	增强 (查看拉取请求) 为指标数据集启用时间序列数据流。这大大减少了指标的存储，并有望逐步提高查询性能。有关更多详细信息，请参阅 https://elastic.ac.cn/guide/en/elasticsearch/reference/current/tsds.html。	—
0.7.7	增强 (查看拉取请求) 为节点数据流添加 metric_type。	—
0.7.6	增强 (查看拉取请求) 为节点数据流添加维度映射。	—
0.7.5	增强 (查看拉取请求) 为执行器数据流添加 metric_type 映射。	—
0.7.4	增强 (查看拉取请求) 为执行器数据流添加维度映射。	—
0.7.3	增强 (查看拉取请求) 为驱动器数据流添加 metric_type 映射。	—
0.7.2	增强 (查看拉取请求) 为驱动器数据流添加维度映射。	—
0.7.1	增强 (查看拉取请求) 为应用程序数据流添加指标类型。	—
0.7.0	增强 (查看拉取请求) 为应用程序数据流添加维度映射。	—
0.6.4	缺陷修复 (查看拉取请求) 修复驱动器数据流中 input_rate 字段的指标类型。	—
0.6.3	增强 (查看拉取请求) 更新 Apache Spark 标志。	—
0.6.2	缺陷修复 (查看拉取请求) 解决 host.ip 字段中的冲突。	—
0.6.1	缺陷修复 (查看拉取请求) 从可视化中删除不正确的筛选器。	—
0.6.0	增强 (查看拉取请求) 将所有权从 obs-service-integrations 重命名为 obs-infraobs-integrations。	—
0.5.0	增强 (查看拉取请求) 将可视化迁移到 Lens。	—
0.4.1	增强 (查看拉取请求) 添加类别和/或子类别。	—
0.4.0	增强 (查看拉取请求) 将 ECS 版本更新到 8.5.1。	—
0.3.0	增强 (查看拉取请求) 更新自述文件。	—
0.2.1	缺陷修复 (查看拉取请求) 从 fields.yml 中删除不必要的字段。	—
0.2.0	增强 (查看拉取请求) 添加仪表板和可视化。	—
0.1.1	增强 (查看拉取请求) 重构 "nodes" 数据流，将其名称调整为 "node"（单数形式）。	—
0.1.0	增强 (查看拉取请求) 实现 "executor" 数据流。增强 (查看拉取请求) 实现 "driver" 数据流。增强 (查看拉取请求) 实现 "application" 数据流。增强 (查看拉取请求) 实现 "nodes" 数据流。	—

« Apache 集成 Apache Tomcat 集成 »