修复水位线错误

编辑

当数据节点的磁盘空间严重不足并达到洪水阶段磁盘使用率水位线时,会记录以下错误:Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block

为了防止磁盘完全满载,当节点达到此水位线时,Elasticsearch 会阻止写入到该节点上任何有分片的索引。如果该阻止影响到相关的系统索引,Kibana 和其他 Elastic Stack 功能可能变得不可用。例如,这可能会导致 Kibana 显示 Kibana Server is not Ready yet 错误消息

当受影响节点的磁盘使用率降至高水位线以下时,Elasticsearch 将自动移除写入阻止。为了实现这一点,Elasticsearch 会尝试将一些受影响节点的分片重新平衡到同一数据层中的其他节点。

监控重新平衡

编辑

要验证分片是否正在从受影响的节点移动直到其降至高水位线以下,请使用cat 分片 APIcat 恢复 API

resp = client.cat.shards(
    v=True,
)
print(resp)

resp1 = client.cat.recovery(
    v=True,
    active_only=True,
)
print(resp1)
const response = await client.cat.shards({
  v: "true",
});
console.log(response);

const response1 = await client.cat.recovery({
  v: "true",
  active_only: "true",
});
console.log(response1);
GET _cat/shards?v=true

GET _cat/recovery?v=true&active_only=true

如果分片仍保留在节点上,使其保持在高水位线之上,请使用集群分配解释 API来获取其分配状态的解释。

resp = client.cluster.allocation_explain(
    index="my-index",
    shard=0,
    primary=False,
)
print(resp)
const response = await client.cluster.allocationExplain({
  index: "my-index",
  shard: 0,
  primary: false,
});
console.log(response);
GET _cluster/allocation/explain
{
  "index": "my-index",
  "shard": 0,
  "primary": false
}

临时缓解

编辑

要立即恢复写入操作,您可以暂时提高磁盘水位线并移除写入阻止

resp = client.cluster.put_settings(
    persistent={
        "cluster.routing.allocation.disk.watermark.low": "90%",
        "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
        "cluster.routing.allocation.disk.watermark.high": "95%",
        "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
        "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
        "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
    },
)
print(resp)

resp1 = client.indices.put_settings(
    index="*",
    expand_wildcards="all",
    settings={
        "index.blocks.read_only_allow_delete": None
    },
)
print(resp1)
response = client.cluster.put_settings(
  body: {
    persistent: {
      'cluster.routing.allocation.disk.watermark.low' => '90%',
      'cluster.routing.allocation.disk.watermark.low.max_headroom' => '100GB',
      'cluster.routing.allocation.disk.watermark.high' => '95%',
      'cluster.routing.allocation.disk.watermark.high.max_headroom' => '20GB',
      'cluster.routing.allocation.disk.watermark.flood_stage' => '97%',
      'cluster.routing.allocation.disk.watermark.flood_stage.max_headroom' => '5GB',
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen' => '97%',
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom' => '5GB'
    }
  }
)
puts response

response = client.indices.put_settings(
  index: '*',
  expand_wildcards: 'all',
  body: {
    'index.blocks.read_only_allow_delete' => nil
  }
)
puts response
const response = await client.cluster.putSettings({
  persistent: {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom":
      "5GB",
  },
});
console.log(response);

const response1 = await client.indices.putSettings({
  index: "*",
  expand_wildcards: "all",
  settings: {
    "index.blocks.read_only_allow_delete": null,
  },
});
console.log(response1);
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
  }
}

PUT */_settings?expand_wildcards=all
{
  "index.blocks.read_only_allow_delete": null
}

当长期解决方案到位后,重新设置或重新配置磁盘水位线

resp = client.cluster.put_settings(
    persistent={
        "cluster.routing.allocation.disk.watermark.low": None,
        "cluster.routing.allocation.disk.watermark.low.max_headroom": None,
        "cluster.routing.allocation.disk.watermark.high": None,
        "cluster.routing.allocation.disk.watermark.high.max_headroom": None,
        "cluster.routing.allocation.disk.watermark.flood_stage": None,
        "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": None,
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen": None,
        "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": None
    },
)
print(resp)
response = client.cluster.put_settings(
  body: {
    persistent: {
      'cluster.routing.allocation.disk.watermark.low' => nil,
      'cluster.routing.allocation.disk.watermark.low.max_headroom' => nil,
      'cluster.routing.allocation.disk.watermark.high' => nil,
      'cluster.routing.allocation.disk.watermark.high.max_headroom' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage.max_headroom' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen' => nil,
      'cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom' => nil
    }
  }
)
puts response
const response = await client.cluster.putSettings({
  persistent: {
    "cluster.routing.allocation.disk.watermark.low": null,
    "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.high": null,
    "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom":
      null,
  },
});
console.log(response);
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": null,
    "cluster.routing.allocation.disk.watermark.low.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.high": null,
    "cluster.routing.allocation.disk.watermark.high.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
  }
}

解决

编辑

作为长期解决方案,我们建议您执行以下最适合您用例的操作之一

  • 向受影响的数据层添加节点

    您应该为使用我们的 Elasticsearch Service、Elastic Cloud Enterprise 和 Elastic Cloud on Kubernetes 平台部署的集群启用自动伸缩

  • 升级现有节点以增加磁盘空间

    在 Elasticsearch Service 上,如果集群健康达到 status:red,可能需要 Elastic 支持的介入。

  • 使用删除索引 API删除不需要的索引
  • 更新相关的ILM 策略,以将索引推送到以后的数据层