集群分配解释 API

编辑

提供分片当前分配的解释。

resp = client.cluster.allocation_explain(
    index="my-index-000001",
    shard=0,
    primary=False,
    current_node="my-node",
)
print(resp)
response = client.cluster.allocation_explain(
  body: {
    index: 'my-index-000001',
    shard: 0,
    primary: false,
    current_node: 'my-node'
  }
)
puts response
const response = await client.cluster.allocationExplain({
  index: "my-index-000001",
  shard: 0,
  primary: false,
  current_node: "my-node",
});
console.log(response);
GET _cluster/allocation/explain
{
  "index": "my-index-000001",
  "shard": 0,
  "primary": false,
  "current_node": "my-node"
}

请求

编辑

GET _cluster/allocation/explain

POST _cluster/allocation/explain

先决条件

编辑
  • 如果启用了 Elasticsearch 安全功能,您必须拥有 monitormanage 集群权限才能使用此 API。

描述

编辑

集群分配解释 API 的目的是为集群中的分片分配提供解释。对于未分配的分片,解释 API 提供分片未分配的原因的解释。对于已分配的分片,解释 API 提供分片保留在其当前节点上且未移动或重新平衡到另一个节点的原因的解释。当尝试诊断为什么分片未分配,或者为什么分片在您可能期望的情况下仍然保留在其当前节点上时,此 API 非常有用。

查询参数

编辑
include_disk_info
(可选,布尔值)如果为 true,则返回有关磁盘使用情况和分片大小的信息。默认为 false
include_yes_decisions
(可选,布尔值)如果为 true,则在解释中返回 YES 决策。默认为 false

请求主体

编辑
current_node
(可选,字符串)指定当前持有要解释的分片的节点的节点 ID 或名称。要解释未分配的分片,请省略此参数。
index
(可选,字符串)指定您想要解释的索引的名称。
primary
(可选,布尔值)如果为 true,则返回给定分片 ID 的主分片的解释。
shard
(可选,整数)指定您想要解释的分片的 ID。

示例

编辑

未分配的主分片

编辑
冲突的设置
编辑

以下请求获取未分配的主分片的分配解释。

resp = client.cluster.allocation_explain(
    index="my-index-000001",
    shard=0,
    primary=True,
)
print(resp)
response = client.cluster.allocation_explain(
  body: {
    index: 'my-index-000001',
    shard: 0,
    primary: true
  }
)
puts response
const response = await client.cluster.allocationExplain({
  index: "my-index-000001",
  shard: 0,
  primary: true,
});
console.log(response);
GET _cluster/allocation/explain
{
  "index": "my-index-000001",
  "shard": 0,
  "primary": true
}

API 响应表明该分片只能分配给不存在的节点。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",                 
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",                   
    "at" : "2017-01-04T18:08:16.600Z",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",                          
  "allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions" : [
    {
      "node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
      "node_name" : "node-0",
      "transport_address" : "127.0.0.1:9401",
      "roles" : ["data", "data_cold", "data_content", "data_frozen", "data_hot", "data_warm", "ingest", "master", "ml", "remote_cluster_client", "transform"],
      "node_attributes" : {},
      "node_decision" : "no",                     
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "filter",                   
          "decision" : "NO",
          "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"  
        }
      ]
    }
  ]
}

分片的当前状态。

分片最初变为未分配的原因。

是否分配分片。

是否将分片分配给特定节点。

导致节点 no 决策的决策器。

解释为什么决策器返回 no 决策,并提供一个有用的提示,指向导致该决策的设置。在此示例中,新创建的索引具有索引设置,该设置要求它只能分配给名为 nonexistent_node 的节点,该节点不存在,因此该索引无法分配。

观看 此视频,了解有关排除节点和索引设置不匹配故障的演练。

超出最大重试次数
编辑

以下响应包含已达到最大分配重试次数的未分配主分片的分配解释。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "at" : "2017-01-04T18:03:28.464Z",
    "failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException",
    "reason": "ALLOCATION_FAILED",
    "failed_allocation_attempts": 5,
    "last_allocation_status": "no",
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "3sULLVJrRneSg0EfBB-2Ew",
      "node_name" : "node_t0",
      "transport_address" : "127.0.0.1:9400",
      "roles" : ["data_content", "data_hot"],
      "node_decision" : "no",
      "store" : {
        "matching_size" : "4.2kb",
        "matching_size_in_bytes" : 4325
      },
      "deciders" : [
        {
          "decider": "max_retry",
          "decision" : "NO",
          "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-07-30T21:04:12.166Z], failed_attempts[5], failed_nodes[[mEKjwwzLT1yJVb8UxT6anw]], delayed=false, details[failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException], allocation_status[deciders_no]]]"
        }
      ]
    }
  ]
}

当 Elasticsearch 无法分配分片时,它将尝试重试分配,直到允许的最大重试次数。之后,Elasticsearch 将停止尝试分配分片,以防止可能影响集群性能的无限重试。运行 集群重新路由 API 来重试分配,如果已解决阻止分配的问题,它将分配分片。

没有有效的分片副本
编辑

以下响应包含先前已分配的未分配主分片的分配解释。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2017-01-04T18:03:28.464Z",
    "details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "Elasticsearch can't allocate this shard because there are no copies of its data in the cluster. Elasticsearch will allocate this shard when a node holding a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot."
}

如果分片未分配,并且分配状态为 no_valid_shard_copy,则应确保所有节点都在集群中。如果丢失了包含分片同步副本的所有节点,则可以恢复分片的数据

观看 此视频,了解有关排除 no_valid_shard_copy 故障的演练。

未分配的副本分片

编辑
分配延迟
编辑

以下响应包含由于延迟分配而未分配的副本的分配解释。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2017-01-04T18:53:59.498Z",
    "details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "allocation_delayed",
  "allocate_explanation" : "The node containing this shard copy recently left the cluster. Elasticsearch is waiting for it to return. If the node does not return within [%s] then Elasticsearch will allocate this shard to another node. Please wait.",
  "configured_delay" : "1m",                      
  "configured_delay_in_millis" : 60000,
  "remaining_delay" : "59.8s",                    
  "remaining_delay_in_millis" : 59824,
  "node_allocation_decisions" : [
    {
      "node_id" : "pmnHu_ooQWCPEFobZGbpWw",
      "node_name" : "node_t2",
      "transport_address" : "127.0.0.1:9402",
      "roles" : ["data_content", "data_hot"],
      "node_decision" : "yes"
    },
    {
      "node_id" : "3sULLVJrRneSg0EfBB-2Ew",
      "node_name" : "node_t0",
      "transport_address" : "127.0.0.1:9400",
      "roles" : ["data_content", "data_hot"],
      "node_decision" : "no",
      "store" : {                                 
        "matching_size" : "4.2kb",
        "matching_size_in_bytes" : 4325
      },
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[my-index-000001][0], node[3sULLVJrRneSg0EfBB-2Ew], [P], s[STARTED], a[id=eV9P8BN1QPqRc3B4PLx6cg]]"
        }
      ]
    }
  ]
}

在分配因持有副本分片的节点离开集群而导致不存在的副本分片之前配置的延迟。

分配副本分片之前剩余的延迟。

有关在节点上找到的分片数据的信息。

分配受限
编辑

以下响应包含已排队等待分配但当前正在等待其他排队分片的副本的分配解释。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2017-01-04T18:53:59.498Z",
    "details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate": "throttled",
  "allocate_explanation": "Elasticsearch is currently busy with other activities. It expects to be able to allocate this shard when those activities finish. Please wait.",
  "node_allocation_decisions" : [
    {
      "node_id" : "3sULLVJrRneSg0EfBB-2Ew",
      "node_name" : "node_t0",
      "transport_address" : "127.0.0.1:9400",
      "roles" : ["data_content", "data_hot"],
      "node_decision" : "no",
      "deciders" : [
        {
          "decider": "throttling",
          "decision": "THROTTLE",
          "explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
        }
      ]
    }
  ]
}

当大量分片正在分配时,可能会出现此瞬态消息。

已分配的分片

编辑
无法保留在当前节点上
编辑

以下响应包含已分配分片的分配解释。该响应表明不允许该分片保留在其当前节点上,并且必须重新分配。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "8lWJeJ7tSoui0bxrwuNhTA",
    "name" : "node_t1",
    "transport_address" : "127.0.0.1:9401",
    "roles" : ["data_content", "data_hot"]
  },
  "can_remain_on_current_node" : "no",            
  "can_remain_decisions" : [                      
    {
      "decider" : "filter",
      "decision" : "NO",
      "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
    }
  ],
  "can_move_to_other_node" : "no",                
  "move_explanation" : "This shard may not remain on its current node, but Elasticsearch isn't allowed to move it to another node. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions" : [
    {
      "node_id" : "_P8olZS8Twax9u6ioN-GGA",
      "node_name" : "node_t0",
      "transport_address" : "127.0.0.1:9400",
      "roles" : ["data_content", "data_hot"],
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
        }
      ]
    }
  ]
}

是否允许分片保留在其当前节点上。

导致分片不允许保留在其当前节点上的决策的原因的决策器。

是否允许将分片分配给另一个节点。

必须保留在当前节点上
编辑

以下响应包含一个必须保留在其当前节点上的分片的分配解释。将分片移动到另一个节点不会改善集群平衡。

{
  "index" : "my-index-000001",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "wLzJm4N4RymDkBYxwWoJsg",
    "name" : "node_t0",
    "transport_address" : "127.0.0.1:9400",
    "roles" : ["data_content", "data_hot"],
    "weight_ranking" : 1
  },
  "can_remain_on_current_node" : "yes",
  "can_rebalance_cluster" : "yes",                
  "can_rebalance_to_other_node" : "no",           
  "rebalance_explanation" : "Elasticsearch cannot rebalance this shard to another node since there is no node to which allocation is permitted which would improve the cluster balance. If you expect this shard to be rebalanced to another node, find this node in the node-by-node explanation and address the reasons which prevent Elasticsearch from rebalancing this shard there.",
  "node_allocation_decisions" : [
    {
      "node_id" : "oE3EGFc8QN-Tdi5FFEprIA",
      "node_name" : "node_t1",
      "transport_address" : "127.0.0.1:9401",
      "roles" : ["data_content", "data_hot"],
      "node_decision" : "worse_balance",          
      "weight_ranking" : 1
    }
  ]
}

是否允许在集群上进行重新平衡。

是否可以将分片重新平衡到另一个节点。

分片无法重新平衡到节点的原因,在这种情况下,表明它没有提供比当前节点更好的平衡。

没有参数

编辑

如果您在没有参数的情况下调用 API,Elasticsearch 将检索任意未分配的主分片或副本分片的分配解释,首先返回任何未分配的主分片。

resp = client.cluster.allocation_explain()
print(resp)
response = client.cluster.allocation_explain
puts response
const response = await client.cluster.allocationExplain();
console.log(response);
GET _cluster/allocation/explain

如果集群不包含任何未分配的分片,则 API 将返回 400 错误。