集群分配解释 API
编辑集群分配解释 API
编辑提供分片当前分配的解释。
resp = client.cluster.allocation_explain( index="my-index-000001", shard=0, primary=False, current_node="my-node", ) print(resp)
response = client.cluster.allocation_explain( body: { index: 'my-index-000001', shard: 0, primary: false, current_node: 'my-node' } ) puts response
const response = await client.cluster.allocationExplain({ index: "my-index-000001", shard: 0, primary: false, current_node: "my-node", }); console.log(response);
GET _cluster/allocation/explain { "index": "my-index-000001", "shard": 0, "primary": false, "current_node": "my-node" }
描述
编辑集群分配解释 API 的目的是为集群中的分片分配提供解释。对于未分配的分片,解释 API 提供分片未分配的原因的解释。对于已分配的分片,解释 API 提供分片保留在其当前节点上且未移动或重新平衡到另一个节点的原因的解释。当尝试诊断为什么分片未分配,或者为什么分片在您可能期望的情况下仍然保留在其当前节点上时,此 API 非常有用。
查询参数
编辑-
include_disk_info
- (可选,布尔值)如果为
true
,则返回有关磁盘使用情况和分片大小的信息。默认为false
。 -
include_yes_decisions
- (可选,布尔值)如果为
true
,则在解释中返回 YES 决策。默认为false
。
请求主体
编辑-
current_node
- (可选,字符串)指定当前持有要解释的分片的节点的节点 ID 或名称。要解释未分配的分片,请省略此参数。
-
index
- (可选,字符串)指定您想要解释的索引的名称。
-
primary
- (可选,布尔值)如果为
true
,则返回给定分片 ID 的主分片的解释。 -
shard
- (可选,整数)指定您想要解释的分片的 ID。
示例
编辑未分配的主分片
编辑冲突的设置
编辑以下请求获取未分配的主分片的分配解释。
resp = client.cluster.allocation_explain( index="my-index-000001", shard=0, primary=True, ) print(resp)
response = client.cluster.allocation_explain( body: { index: 'my-index-000001', shard: 0, primary: true } ) puts response
const response = await client.cluster.allocationExplain({ index: "my-index-000001", shard: 0, primary: true, }); console.log(response);
GET _cluster/allocation/explain { "index": "my-index-000001", "shard": 0, "primary": true }
API 响应表明该分片只能分配给不存在的节点。
{ "index" : "my-index-000001", "shard" : 0, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "INDEX_CREATED", "at" : "2017-01-04T18:08:16.600Z", "last_allocation_status" : "no" }, "can_allocate" : "no", "allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.", "node_allocation_decisions" : [ { "node_id" : "8qt2rY-pT6KNZB3-hGfLnw", "node_name" : "node-0", "transport_address" : "127.0.0.1:9401", "roles" : ["data", "data_cold", "data_content", "data_frozen", "data_hot", "data_warm", "ingest", "master", "ml", "remote_cluster_client", "transform"], "node_attributes" : {}, "node_decision" : "no", "weight_ranking" : 1, "deciders" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]" } ] } ] }
分片的当前状态。 |
|
分片最初变为未分配的原因。 |
|
是否分配分片。 |
|
是否将分片分配给特定节点。 |
|
导致节点 |
|
解释为什么决策器返回 |
观看 此视频,了解有关排除节点和索引设置不匹配故障的演练。
超出最大重试次数
编辑以下响应包含已达到最大分配重试次数的未分配主分片的分配解释。
{ "index" : "my-index-000001", "shard" : 0, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "at" : "2017-01-04T18:03:28.464Z", "failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException", "reason": "ALLOCATION_FAILED", "failed_allocation_attempts": 5, "last_allocation_status": "no", }, "can_allocate": "no", "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes", "node_allocation_decisions" : [ { "node_id" : "3sULLVJrRneSg0EfBB-2Ew", "node_name" : "node_t0", "transport_address" : "127.0.0.1:9400", "roles" : ["data_content", "data_hot"], "node_decision" : "no", "store" : { "matching_size" : "4.2kb", "matching_size_in_bytes" : 4325 }, "deciders" : [ { "decider": "max_retry", "decision" : "NO", "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed&metric=none] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-07-30T21:04:12.166Z], failed_attempts[5], failed_nodes[[mEKjwwzLT1yJVb8UxT6anw]], delayed=false, details[failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException], allocation_status[deciders_no]]]" } ] } ] }
当 Elasticsearch 无法分配分片时,它将尝试重试分配,直到允许的最大重试次数。之后,Elasticsearch 将停止尝试分配分片,以防止可能影响集群性能的无限重试。运行 集群重新路由 API 来重试分配,如果已解决阻止分配的问题,它将分配分片。
没有有效的分片副本
编辑以下响应包含先前已分配的未分配主分片的分配解释。
{ "index" : "my-index-000001", "shard" : 0, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "NODE_LEFT", "at" : "2017-01-04T18:03:28.464Z", "details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "Elasticsearch can't allocate this shard because there are no copies of its data in the cluster. Elasticsearch will allocate this shard when a node holding a good copy of its data joins the cluster. If no such node is available, restore this index from a recent snapshot." }
如果分片未分配,并且分配状态为 no_valid_shard_copy
,则应确保所有节点都在集群中。如果丢失了包含分片同步副本的所有节点,则可以恢复分片的数据。
观看 此视频,了解有关排除 no_valid_shard_copy
故障的演练。
未分配的副本分片
编辑分配延迟
编辑以下响应包含由于延迟分配而未分配的副本的分配解释。
{ "index" : "my-index-000001", "shard" : 0, "primary" : false, "current_state" : "unassigned", "unassigned_info" : { "reason" : "NODE_LEFT", "at" : "2017-01-04T18:53:59.498Z", "details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]", "last_allocation_status" : "no_attempt" }, "can_allocate" : "allocation_delayed", "allocate_explanation" : "The node containing this shard copy recently left the cluster. Elasticsearch is waiting for it to return. If the node does not return within [%s] then Elasticsearch will allocate this shard to another node. Please wait.", "configured_delay" : "1m", "configured_delay_in_millis" : 60000, "remaining_delay" : "59.8s", "remaining_delay_in_millis" : 59824, "node_allocation_decisions" : [ { "node_id" : "pmnHu_ooQWCPEFobZGbpWw", "node_name" : "node_t2", "transport_address" : "127.0.0.1:9402", "roles" : ["data_content", "data_hot"], "node_decision" : "yes" }, { "node_id" : "3sULLVJrRneSg0EfBB-2Ew", "node_name" : "node_t0", "transport_address" : "127.0.0.1:9400", "roles" : ["data_content", "data_hot"], "node_decision" : "no", "store" : { "matching_size" : "4.2kb", "matching_size_in_bytes" : 4325 }, "deciders" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "a copy of this shard is already allocated to this node [[my-index-000001][0], node[3sULLVJrRneSg0EfBB-2Ew], [P], s[STARTED], a[id=eV9P8BN1QPqRc3B4PLx6cg]]" } ] } ] }
分配受限
编辑以下响应包含已排队等待分配但当前正在等待其他排队分片的副本的分配解释。
{ "index" : "my-index-000001", "shard" : 0, "primary" : false, "current_state" : "unassigned", "unassigned_info" : { "reason" : "NODE_LEFT", "at" : "2017-01-04T18:53:59.498Z", "details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]", "last_allocation_status" : "no_attempt" }, "can_allocate": "throttled", "allocate_explanation": "Elasticsearch is currently busy with other activities. It expects to be able to allocate this shard when those activities finish. Please wait.", "node_allocation_decisions" : [ { "node_id" : "3sULLVJrRneSg0EfBB-2Ew", "node_name" : "node_t0", "transport_address" : "127.0.0.1:9400", "roles" : ["data_content", "data_hot"], "node_decision" : "no", "deciders" : [ { "decider": "throttling", "decision": "THROTTLE", "explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])" } ] } ] }
当大量分片正在分配时,可能会出现此瞬态消息。
已分配的分片
编辑无法保留在当前节点上
编辑以下响应包含已分配分片的分配解释。该响应表明不允许该分片保留在其当前节点上,并且必须重新分配。
{ "index" : "my-index-000001", "shard" : 0, "primary" : true, "current_state" : "started", "current_node" : { "id" : "8lWJeJ7tSoui0bxrwuNhTA", "name" : "node_t1", "transport_address" : "127.0.0.1:9401", "roles" : ["data_content", "data_hot"] }, "can_remain_on_current_node" : "no", "can_remain_decisions" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]" } ], "can_move_to_other_node" : "no", "move_explanation" : "This shard may not remain on its current node, but Elasticsearch isn't allowed to move it to another node. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.", "node_allocation_decisions" : [ { "node_id" : "_P8olZS8Twax9u6ioN-GGA", "node_name" : "node_t0", "transport_address" : "127.0.0.1:9400", "roles" : ["data_content", "data_hot"], "node_decision" : "no", "weight_ranking" : 1, "deciders" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]" } ] } ] }
必须保留在当前节点上
编辑以下响应包含一个必须保留在其当前节点上的分片的分配解释。将分片移动到另一个节点不会改善集群平衡。
{ "index" : "my-index-000001", "shard" : 0, "primary" : true, "current_state" : "started", "current_node" : { "id" : "wLzJm4N4RymDkBYxwWoJsg", "name" : "node_t0", "transport_address" : "127.0.0.1:9400", "roles" : ["data_content", "data_hot"], "weight_ranking" : 1 }, "can_remain_on_current_node" : "yes", "can_rebalance_cluster" : "yes", "can_rebalance_to_other_node" : "no", "rebalance_explanation" : "Elasticsearch cannot rebalance this shard to another node since there is no node to which allocation is permitted which would improve the cluster balance. If you expect this shard to be rebalanced to another node, find this node in the node-by-node explanation and address the reasons which prevent Elasticsearch from rebalancing this shard there.", "node_allocation_decisions" : [ { "node_id" : "oE3EGFc8QN-Tdi5FFEprIA", "node_name" : "node_t1", "transport_address" : "127.0.0.1:9401", "roles" : ["data_content", "data_hot"], "node_decision" : "worse_balance", "weight_ranking" : 1 } ] }
没有参数
编辑如果您在没有参数的情况下调用 API,Elasticsearch 将检索任意未分配的主分片或副本分片的分配解释,首先返回任何未分配的主分片。
resp = client.cluster.allocation_explain() print(resp)
response = client.cluster.allocation_explain puts response
const response = await client.cluster.allocationExplain(); console.log(response);
GET _cluster/allocation/explain
如果集群不包含任何未分配的分片,则 API 将返回 400
错误。