从 Rollup 迁移到降采样

编辑

Rollup 和降采样是两种不同的功能,都允许汇总历史指标。从高层来看,与降采样相比,Rollup 更为灵活,但降采样是一种更健壮且更易于使用的指标降采样功能。

降采样的以下方面更容易或更健壮

  • 无需调度作业。降采样与索引生命周期管理 (ILM) 和数据流生命周期 (DSL) 集成。
  • 没有单独的搜索 API。可以通过搜索 API 和 es|ql 访问降采样索引。
  • 没有单独的 Rollup 配置。降采样使用映射中的时间序列维度和指标配置。

不可能将所有 Rollup 用法迁移到降采样。主要要求是将数据作为时间序列数据流 (TSDS)存储在 Elasticsearch 中。基本上按时间和所有维度汇总数据的 Rollup 用法可以迁移到降采样。

一个可以迁移到降采样的 Rollup 用法示例

resp = client.rollup.put_job(
    id="sensor",
    index_pattern="sensor-*",
    rollup_index="sensor_rollup",
    cron="0 0 * * * *",
    page_size=1000,
    groups={
        "date_histogram": {
            "field": "timestamp",
            "fixed_interval": "60m"
        },
        "terms": {
            "fields": [
                "node"
            ]
        }
    },
    metrics=[
        {
            "field": "temperature",
            "metrics": [
                "min",
                "max",
                "sum"
            ]
        },
        {
            "field": "voltage",
            "metrics": [
                "avg"
            ]
        }
    ],
)
print(resp)
const response = await client.rollup.putJob({
  id: "sensor",
  index_pattern: "sensor-*",
  rollup_index: "sensor_rollup",
  cron: "0 0 * * * *",
  page_size: 1000,
  groups: {
    date_histogram: {
      field: "timestamp",
      fixed_interval: "60m",
    },
    terms: {
      fields: ["node"],
    },
  },
  metrics: [
    {
      field: "temperature",
      metrics: ["min", "max", "sum"],
    },
    {
      field: "voltage",
      metrics: ["avg"],
    },
  ],
});
console.log(response);
PUT _rollup/job/sensor
{
  "index_pattern": "sensor-*",
  "rollup_index": "sensor_rollup",
  "cron": "0 0 * * * *", 
  "page_size": 1000,
  "groups": { 
    "date_histogram": {
      "field": "timestamp",
      "fixed_interval": "60m" 
    },
    "terms": {
      "fields": [ "node" ]
    }
  },
  "metrics": [
    {
      "field": "temperature",
      "metrics": [ "min", "max", "sum" ] 
    },
    {
      "field": "voltage",
      "metrics": [ "avg" ] 
    }
  ]
}

使用 DSL 通过降采样的等效时间序列数据流 (TSDS)设置

resp = client.indices.put_index_template(
    name="sensor-template",
    index_patterns=[
        "sensor-*"
    ],
    data_stream={},
    template={
        "lifecycle": {
            "downsampling": [
                {
                    "after": "1d",
                    "fixed_interval": "1h"
                }
            ]
        },
        "settings": {
            "index.mode": "time_series"
        },
        "mappings": {
            "properties": {
                "node": {
                    "type": "keyword",
                    "time_series_dimension": True
                },
                "temperature": {
                    "type": "half_float",
                    "time_series_metric": "gauge"
                },
                "voltage": {
                    "type": "half_float",
                    "time_series_metric": "gauge"
                },
                "@timestamp": {
                    "type": "date"
                }
            }
        }
    },
)
print(resp)
response = client.indices.put_index_template(
  name: 'sensor-template',
  body: {
    index_patterns: [
      'sensor-*'
    ],
    data_stream: {},
    template: {
      lifecycle: {
        downsampling: [
          {
            after: '1d',
            fixed_interval: '1h'
          }
        ]
      },
      settings: {
        'index.mode' => 'time_series'
      },
      mappings: {
        properties: {
          node: {
            type: 'keyword',
            time_series_dimension: true
          },
          temperature: {
            type: 'half_float',
            time_series_metric: 'gauge'
          },
          voltage: {
            type: 'half_float',
            time_series_metric: 'gauge'
          },
          "@timestamp": {
            type: 'date'
          }
        }
      }
    }
  }
)
puts response
const response = await client.indices.putIndexTemplate({
  name: "sensor-template",
  index_patterns: ["sensor-*"],
  data_stream: {},
  template: {
    lifecycle: {
      downsampling: [
        {
          after: "1d",
          fixed_interval: "1h",
        },
      ],
    },
    settings: {
      "index.mode": "time_series",
    },
    mappings: {
      properties: {
        node: {
          type: "keyword",
          time_series_dimension: true,
        },
        temperature: {
          type: "half_float",
          time_series_metric: "gauge",
        },
        voltage: {
          type: "half_float",
          time_series_metric: "gauge",
        },
        "@timestamp": {
          type: "date",
        },
      },
    },
  },
});
console.log(response);
PUT _index_template/sensor-template
{
  "index_patterns": ["sensor-*"],
  "data_stream": { },
  "template": {
    "lifecycle": {
        "downsampling": [
            {
                "after": "1d", 
                "fixed_interval": "1h" 
            }
        ]
    },
    "settings": {
      "index.mode": "time_series"
    },
    "mappings": {
      "properties": {
        "node": {
          "type": "keyword",
          "time_series_dimension": true 
        },
        "temperature": {
          "type": "half_float",
          "time_series_metric": "gauge" 
        },
        "voltage": {
          "type": "half_float",
          "time_series_metric": "gauge" 
        },
        "@timestamp": { 
          "type": "date"
        }
      }
    }
  }
}

降采样配置包含在上述时间序列数据流 (TSDS)模板中。仅 downsampling 部分是启用降采样所必需的,它指示何时以什么固定间隔进行降采样。

在 Rollup 作业中,cron 字段决定何时进行 Rollup 文档。在索引模板中,after 字段决定何时进行降采样汇总文档(请注意,这是在执行滚动后经过的时间)。

在 Rollup 作业中,groups 字段决定了所有分组文档汇总到的维度。在索引模板中,将 time_series_dimension 设置为 true 的字段和 @timestamp 字段决定了分组。

在 Rollup 作业中,fixed_interval 字段决定了如何聚合时间戳作为分组的一部分。在索引模板中,fixed_interval 字段具有相同目的。请注意,降采样不支持日历间隔。

在 Rollup 作业中,metrics 字段定义了指标以及如何存储这些指标。在索引模板中,所有具有 time_series_metric 的字段都是指标字段。如果某个字段的 time_series_metric 属性值为 gauge,则在降采样索引中存储该字段的最小值、最大值、总和和值计数。如果某个字段的 time_series_metric 属性值为 counter,则只存储该字段在降采样索引中的最后值。