监控 Elasticsearch 集群状态

编辑

监控 Elasticsearch 集群状态

编辑

您可以轻松配置一个基本的监视器来监控 Elasticsearch 集群的健康状况

安排监视器并添加输入

编辑

监视器计划控制监视器触发的频率。监视器输入获取您要评估的数据。

定义计划的最简单方法是指定一个间隔。例如,以下计划每 10 秒运行一次

resp = client.watcher.put_watch(
    id="cluster_health_watch",
    trigger={
        "schedule": {
            "interval": "10s"
        }
    },
)
print(resp)
const response = await client.watcher.putWatch({
  id: "cluster_health_watch",
  trigger: {
    schedule: {
      interval: "10s",
    },
  },
});
console.log(response);
PUT _watcher/watch/cluster_health_watch
{
  "trigger" : {
    "schedule" : { "interval" : "10s" } 
  }
}

通常将计划配置为较低的频率运行。此示例将间隔设置为 10 秒,以便您轻松查看触发的监视器。由于此监视器运行频率很高,因此在完成实验后不要忘记删除监视器

要获取集群的状态,您可以调用集群健康 API

resp = client.cluster.health(
    pretty=True,
)
print(resp)
response = client.cluster.health(
  pretty: true
)
puts response
const response = await client.cluster.health({
  pretty: "true",
});
console.log(response);
GET _cluster/health?pretty

要将健康状态加载到您的监视器中,您只需添加一个HTTP 输入,调用集群健康 API

resp = client.watcher.put_watch(
    id="cluster_health_watch",
    trigger={
        "schedule": {
            "interval": "10s"
        }
    },
    input={
        "http": {
            "request": {
                "host": "localhost",
                "port": 9200,
                "path": "/_cluster/health"
            }
        }
    },
)
print(resp)
const response = await client.watcher.putWatch({
  id: "cluster_health_watch",
  trigger: {
    schedule: {
      interval: "10s",
    },
  },
  input: {
    http: {
      request: {
        host: "localhost",
        port: 9200,
        path: "/_cluster/health",
      },
    },
  },
});
console.log(response);
PUT _watcher/watch/cluster_health_watch
{
  "trigger" : {
    "schedule" : { "interval" : "10s" }
  },
  "input" : {
    "http" : {
      "request" : {
        "host" : "localhost",
        "port" : 9200,
        "path" : "/_cluster/health"
      }
    }
  }
}

如果您正在使用安全性功能,那么您还需要在监视器配置中提供一些身份验证凭据

resp = client.watcher.put_watch(
    id="cluster_health_watch",
    trigger={
        "schedule": {
            "interval": "10s"
        }
    },
    input={
        "http": {
            "request": {
                "host": "localhost",
                "port": 9200,
                "path": "/_cluster/health",
                "auth": {
                    "basic": {
                        "username": "elastic",
                        "password": "x-pack-test-password"
                    }
                }
            }
        }
    },
)
print(resp)
const response = await client.watcher.putWatch({
  id: "cluster_health_watch",
  trigger: {
    schedule: {
      interval: "10s",
    },
  },
  input: {
    http: {
      request: {
        host: "localhost",
        port: 9200,
        path: "/_cluster/health",
        auth: {
          basic: {
            username: "elastic",
            password: "x-pack-test-password",
          },
        },
      },
    },
  },
});
console.log(response);
PUT _watcher/watch/cluster_health_watch
{
  "trigger" : {
    "schedule" : { "interval" : "10s" }
  },
  "input" : {
    "http" : {
      "request" : {
        "host" : "localhost",
        "port" : 9200,
        "path" : "/_cluster/health",
        "auth": {
          "basic": {
            "username": "elastic",
            "password": "x-pack-test-password"
          }
        }
      }
    }
  }
}

最好创建一个具有使用此类监视器配置所需最低权限的用户。

根据您的集群配置方式,在监视器访问您的集群之前,可能需要其他设置,例如密钥库、信任库或证书。有关更多信息,请参阅Watcher 设置

如果您检查监视器历史记录,您会看到每次执行监视器时,集群状态都会记录为watch_record的一部分。

例如,以下请求从监视器历史记录中检索最近的十个监视器记录

resp = client.search(
    index=".watcher-history*",
    sort=[
        {
            "result.execution_time": "desc"
        }
    ],
)
print(resp)
response = client.search(
  index: '.watcher-history*',
  body: {
    sort: [
      {
        'result.execution_time' => 'desc'
      }
    ]
  }
)
puts response
const response = await client.search({
  index: ".watcher-history*",
  sort: [
    {
      "result.execution_time": "desc",
    },
  ],
});
console.log(response);
GET .watcher-history*/_search
{
  "sort" : [
    { "result.execution_time" : "desc" }
  ]
}

添加条件

编辑

条件评估您已加载到监视器中的数据,并确定是否需要执行任何操作。由于您已定义一个将集群状态加载到监视器的输入,因此您可以定义一个检查该状态的条件。

例如,您可以添加一个条件来检查状态是否为红色。

resp = client.watcher.put_watch(
    id="cluster_health_watch",
    trigger={
        "schedule": {
            "interval": "10s"
        }
    },
    input={
        "http": {
            "request": {
                "host": "localhost",
                "port": 9200,
                "path": "/_cluster/health"
            }
        }
    },
    condition={
        "compare": {
            "ctx.payload.status": {
                "eq": "red"
            }
        }
    },
)
print(resp)
const response = await client.watcher.putWatch({
  id: "cluster_health_watch",
  trigger: {
    schedule: {
      interval: "10s",
    },
  },
  input: {
    http: {
      request: {
        host: "localhost",
        port: 9200,
        path: "/_cluster/health",
      },
    },
  },
  condition: {
    compare: {
      "ctx.payload.status": {
        eq: "red",
      },
    },
  },
});
console.log(response);
PUT _watcher/watch/cluster_health_watch
{
  "trigger" : {
    "schedule" : { "interval" : "10s" } 
  },
  "input" : {
    "http" : {
      "request" : {
       "host" : "localhost",
       "port" : 9200,
       "path" : "/_cluster/health"
      }
    }
  },
  "condition" : {
    "compare" : {
      "ctx.payload.status" : { "eq" : "red" }
    }
  }
}

通常将计划配置为较低的频率运行。此示例将间隔设置为 10 秒,以便您轻松查看触发的监视器。

如果您检查监视器历史记录,您会看到每次执行监视器时,条件结果都会记录为watch_record的一部分。

要检查是否满足条件,您可以运行以下查询。

resp = client.search(
    index=".watcher-history*",
    pretty=True,
    query={
        "match": {
            "result.condition.met": True
        }
    },
)
print(resp)
response = client.search(
  index: '.watcher-history*',
  pretty: true,
  body: {
    query: {
      match: {
        'result.condition.met' => true
      }
    }
  }
)
puts response
const response = await client.search({
  index: ".watcher-history*",
  pretty: "true",
  query: {
    match: {
      "result.condition.met": true,
    },
  },
});
console.log(response);
GET .watcher-history*/_search?pretty
{
  "query" : {
    "match" : { "result.condition.met" : true }
  }
}

执行操作

编辑

在监视器历史记录中记录watch_records固然很好,但 Watcher 的真正强大之处在于能够在响应警报时执行某些操作。监视器的操作定义了当监视器条件为真时要执行的操作——您可以发送电子邮件、调用第三方 Webhook,或在满足监视器条件时将文档写入 Elasticsearch 索引或日志。

例如,您可以添加一个操作,以便在状态为红色时索引集群状态信息。

resp = client.watcher.put_watch(
    id="cluster_health_watch",
    trigger={
        "schedule": {
            "interval": "10s"
        }
    },
    input={
        "http": {
            "request": {
                "host": "localhost",
                "port": 9200,
                "path": "/_cluster/health"
            }
        }
    },
    condition={
        "compare": {
            "ctx.payload.status": {
                "eq": "red"
            }
        }
    },
    actions={
        "send_email": {
            "email": {
                "to": "[email protected]",
                "subject": "Cluster Status Warning",
                "body": "Cluster status is RED"
            }
        }
    },
)
print(resp)
const response = await client.watcher.putWatch({
  id: "cluster_health_watch",
  trigger: {
    schedule: {
      interval: "10s",
    },
  },
  input: {
    http: {
      request: {
        host: "localhost",
        port: 9200,
        path: "/_cluster/health",
      },
    },
  },
  condition: {
    compare: {
      "ctx.payload.status": {
        eq: "red",
      },
    },
  },
  actions: {
    send_email: {
      email: {
        to: "[email protected]",
        subject: "Cluster Status Warning",
        body: "Cluster status is RED",
      },
    },
  },
});
console.log(response);
PUT _watcher/watch/cluster_health_watch
{
  "trigger" : {
    "schedule" : { "interval" : "10s" }
  },
  "input" : {
    "http" : {
      "request" : {
       "host" : "localhost",
       "port" : 9200,
       "path" : "/_cluster/health"
      }
    }
  },
  "condition" : {
    "compare" : {
      "ctx.payload.status" : { "eq" : "red" }
    }
  },
  "actions" : {
    "send_email" : {
      "email" : {
        "to" : "[email protected]",
        "subject" : "Cluster Status Warning",
        "body" : "Cluster status is RED"
      }
    }
  }
}

要使 Watcher 发送电子邮件,您必须在elasticsearch.yml配置文件中配置一个电子邮件帐户,然后重新启动 Elasticsearch。要添加电子邮件帐户,请设置xpack.notification.email.account属性。

例如,以下代码片段配置一个名为work的 Gmail 帐户

xpack.notification.email.account:
  work:
    profile: gmail
    email_defaults:
      from: <email> 
    smtp:
      auth: true
      starttls.enable: true
      host: smtp.gmail.com
      port: 587
      user: <username> 
      password: <password> 

<email>替换为您要从中发送通知的电子邮件地址。

<username>替换为您的 Gmail 用户名(通常是您的 Gmail 地址)。

<password>替换为您的 Gmail 密码。

如果您的电子邮件帐户启用了高级安全选项,则需要执行其他步骤才能从 Watcher 发送电子邮件。有关更多信息,请参阅配置电子邮件帐户

您可以检查监视器历史记录或status_index,以查看是否执行了操作。

resp = client.search(
    index=".watcher-history*",
    pretty=True,
    query={
        "match": {
            "result.condition.met": True
        }
    },
)
print(resp)
response = client.search(
  index: '.watcher-history*',
  pretty: true,
  body: {
    query: {
      match: {
        'result.condition.met' => true
      }
    }
  }
)
puts response
const response = await client.search({
  index: ".watcher-history*",
  pretty: "true",
  query: {
    match: {
      "result.condition.met": true,
    },
  },
});
console.log(response);
GET .watcher-history*/_search?pretty
{
  "query" : {
    "match" : { "result.condition.met" : true }
  }
}

删除监视器

编辑

由于cluster_health_watch配置为每 10 秒运行一次,因此请确保在完成实验后将其删除。否则,您将无限期地给自己发送垃圾邮件。

要删除监视器,请使用删除监视器 API

resp = client.watcher.delete_watch(
    id="cluster_health_watch",
)
print(resp)
response = client.watcher.delete_watch(
  id: 'cluster_health_watch'
)
puts response
const response = await client.watcher.deleteWatch({
  id: "cluster_health_watch",
});
console.log(response);
DELETE _watcher/watch/cluster_health_watch