教程:基于单向跨集群复制的灾备

编辑

教程:基于单向跨集群复制的灾备

编辑

了解如何基于单向跨集群复制在两个集群之间进行故障转移和故障恢复。您还可以访问 双向灾备 以设置自动故障转移和故障恢复的数据流复制,无需人工干预。

  • 设置从 clusterAclusterB 的单向跨集群复制。
  • 故障转移 - 如果 clusterA 离线,clusterB 需要将跟随者索引“提升”为常规索引以允许写入操作。所有数据摄取都需要重定向到 clusterB,这由客户端(Logstash、Beats、Elastic Agents 等)控制。
  • 故障恢复 - 当 clusterA 重新上线时,它将扮演跟随者的角色,并从 clusterB 复制领导者索引。
Uni-directional cross cluster replication failover and failback

跨集群复制提供仅复制用户生成索引的功能。跨集群复制并非设计用于复制系统生成的索引或快照设置,并且无法跨集群复制 ILM 或 SLM 策略。在跨集群复制中了解更多 限制

先决条件

编辑

在完成本教程之前,设置跨集群复制 以连接两个集群并配置跟随者索引。

在本教程中,kibana_sample_data_ecommerceclusterA 复制到 clusterB

resp = client.cluster.put_settings(
    persistent={
        "cluster": {
            "remote": {
                "clusterA": {
                    "mode": "proxy",
                    "skip_unavailable": "true",
                    "server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
                    "proxy_socket_connections": "18",
                    "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
                }
            }
        }
    },
)
print(resp)
response = client.cluster.put_settings(
  body: {
    persistent: {
      cluster: {
        remote: {
          "clusterA": {
            mode: 'proxy',
            skip_unavailable: 'true',
            server_name: 'clustera.es.region-a.gcp.elastic-cloud.com',
            proxy_socket_connections: '18',
            proxy_address: 'clustera.es.region-a.gcp.elastic-cloud.com:9400'
          }
        }
      }
    }
  }
)
puts response
const response = await client.cluster.putSettings({
  persistent: {
    cluster: {
      remote: {
        clusterA: {
          mode: "proxy",
          skip_unavailable: "true",
          server_name: "clustera.es.region-a.gcp.elastic-cloud.com",
          proxy_socket_connections: "18",
          proxy_address: "clustera.es.region-a.gcp.elastic-cloud.com:9400",
        },
      },
    },
  },
});
console.log(response);
### On clusterB ###
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "clusterA": {
          "mode": "proxy",
          "skip_unavailable": "true",
          "server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
          "proxy_socket_connections": "18",
          "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
        }
      }
    }
  }
}
resp = client.ccr.follow(
    index="kibana_sample_data_ecommerce2",
    wait_for_active_shards="1",
    remote_cluster="clusterA",
    leader_index="kibana_sample_data_ecommerce",
)
print(resp)
const response = await client.ccr.follow({
  index: "kibana_sample_data_ecommerce2",
  wait_for_active_shards: 1,
  remote_cluster: "clusterA",
  leader_index: "kibana_sample_data_ecommerce",
});
console.log(response);
### On clusterB ###
PUT /kibana_sample_data_ecommerce2/_ccr/follow?wait_for_active_shards=1
{
  "remote_cluster": "clusterA",
  "leader_index": "kibana_sample_data_ecommerce"
}

写入(例如摄取或更新)应仅在领导者索引上进行。跟随者索引为只读,将拒绝任何写入。

clusterA 宕机时的故障转移

编辑
  1. clusterB 中的跟随者索引提升为常规索引,以便它们接受写入。这可以通过以下方式实现

    • 首先,暂停跟随者索引的索引跟随。
    • 接下来,关闭跟随者索引。
    • 取消跟随领导者索引。
    • 最后,打开跟随者索引(此时为常规索引)。
    resp = client.ccr.pause_follow(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp)
    
    resp1 = client.indices.close(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp1)
    
    resp2 = client.ccr.unfollow(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp2)
    
    resp3 = client.indices.open(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp3)
    response = client.ccr.pause_follow(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    
    response = client.indices.close(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    
    response = client.ccr.unfollow(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    
    response = client.indices.open(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    const response = await client.ccr.pauseFollow({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response);
    
    const response1 = await client.indices.close({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response1);
    
    const response2 = await client.ccr.unfollow({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response2);
    
    const response3 = await client.indices.open({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response3);
    ### On clusterB ###
    POST /kibana_sample_data_ecommerce2/_ccr/pause_follow
    POST /kibana_sample_data_ecommerce2/_close
    POST /kibana_sample_data_ecommerce2/_ccr/unfollow
    POST /kibana_sample_data_ecommerce2/_open
  2. 在客户端(Logstash、Beats、Elastic Agent)上,手动重新启用 kibana_sample_data_ecommerce2 的数据摄取并将流量重定向到 clusterB。在此期间,您还应将所有搜索流量重定向到 clusterB 集群。您可以通过将文档摄取到此索引来模拟此操作。您应该注意到此索引现在可写。

    resp = client.index(
        index="kibana_sample_data_ecommerce2",
        document={
            "user": "kimchy"
        },
    )
    print(resp)
    response = client.index(
      index: 'kibana_sample_data_ecommerce2',
      body: {
        user: 'kimchy'
      }
    )
    puts response
    const response = await client.index({
      index: "kibana_sample_data_ecommerce2",
      document: {
        user: "kimchy",
      },
    });
    console.log(response);
    ### On clusterB ###
    POST kibana_sample_data_ecommerce2/_doc/
    {
      "user": "kimchy"
    }

clusterA 恢复时的故障恢复

编辑

clusterA 恢复时,clusterB 成为新的领导者,clusterA 成为跟随者。

  1. clusterA 上设置远程集群 clusterB

    resp = client.cluster.put_settings(
        persistent={
            "cluster": {
                "remote": {
                    "clusterB": {
                        "mode": "proxy",
                        "skip_unavailable": "true",
                        "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
                        "proxy_socket_connections": "18",
                        "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
                    }
                }
            }
        },
    )
    print(resp)
    response = client.cluster.put_settings(
      body: {
        persistent: {
          cluster: {
            remote: {
              "clusterB": {
                mode: 'proxy',
                skip_unavailable: 'true',
                server_name: 'clusterb.es.region-b.gcp.elastic-cloud.com',
                proxy_socket_connections: '18',
                proxy_address: 'clusterb.es.region-b.gcp.elastic-cloud.com:9400'
              }
            }
          }
        }
      }
    )
    puts response
    const response = await client.cluster.putSettings({
      persistent: {
        cluster: {
          remote: {
            clusterB: {
              mode: "proxy",
              skip_unavailable: "true",
              server_name: "clusterb.es.region-b.gcp.elastic-cloud.com",
              proxy_socket_connections: "18",
              proxy_address: "clusterb.es.region-b.gcp.elastic-cloud.com:9400",
            },
          },
        },
      },
    });
    console.log(response);
    ### On clusterA ###
    PUT _cluster/settings
    {
      "persistent": {
        "cluster": {
          "remote": {
            "clusterB": {
              "mode": "proxy",
              "skip_unavailable": "true",
              "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
              "proxy_socket_connections": "18",
              "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
            }
          }
        }
      }
    }
  2. 在将任何索引转换为跟随者之前,需要丢弃现有数据。确保在删除 clusterA 上的任何索引之前,clusterB 上提供了最新数据。

    resp = client.indices.delete(
        index="kibana_sample_data_ecommerce",
    )
    print(resp)
    response = client.indices.delete(
      index: 'kibana_sample_data_ecommerce'
    )
    puts response
    const response = await client.indices.delete({
      index: "kibana_sample_data_ecommerce",
    });
    console.log(response);
    ### On clusterA ###
    DELETE kibana_sample_data_ecommerce
  3. clusterA 上创建一个跟随者索引,现在跟随 clusterB 中的领导者索引。

    resp = client.ccr.follow(
        index="kibana_sample_data_ecommerce",
        wait_for_active_shards="1",
        remote_cluster="clusterB",
        leader_index="kibana_sample_data_ecommerce2",
    )
    print(resp)
    const response = await client.ccr.follow({
      index: "kibana_sample_data_ecommerce",
      wait_for_active_shards: 1,
      remote_cluster: "clusterB",
      leader_index: "kibana_sample_data_ecommerce2",
    });
    console.log(response);
    ### On clusterA ###
    PUT /kibana_sample_data_ecommerce/_ccr/follow?wait_for_active_shards=1
    {
      "remote_cluster": "clusterB",
      "leader_index": "kibana_sample_data_ecommerce2"
    }
  4. 跟随者集群上的索引现在包含更新的文档。

    resp = client.search(
        index="kibana_sample_data_ecommerce",
        q="kimchy",
    )
    print(resp)
    response = client.search(
      index: 'kibana_sample_data_ecommerce',
      q: 'kimchy'
    )
    puts response
    const response = await client.search({
      index: "kibana_sample_data_ecommerce",
      q: "kimchy",
    });
    console.log(response);
    ### On clusterA ###
    GET kibana_sample_data_ecommerce/_search?q=kimchy

    如果软删除在复制到跟随者之前被合并,则以下过程将由于领导者上的历史记录不完整而失败,请参阅 index.soft_deletes.retention_lease.period 以了解更多详细信息。