教程:基于单向跨集群复制的灾难恢复

编辑

教程:基于单向跨集群复制的灾难恢复

编辑

了解如何在两个集群之间基于单向跨集群复制进行故障转移和故障恢复。您还可以访问 双向灾难恢复,设置可自动故障转移和故障恢复而无需人工干预的复制数据流。

  • 设置从 clusterA 复制到 clusterB 的单向跨集群复制。
  • 故障转移 - 如果 clusterA 脱机,clusterB 需要将“追随者”索引“提升”为常规索引,以允许写入操作。所有摄取都需要重定向到 clusterB,这由客户端(Logstash、Beats、Elastic Agents 等)控制。
  • 故障恢复 - 当 clusterA 重新联机时,它将承担追随者的角色,并从 clusterB 复制领导者索引。
Uni-directional cross cluster replication failover and failback

跨集群复制仅提供复制用户生成索引的功能。跨集群复制并非旨在复制系统生成的索引或快照设置,并且无法跨集群复制 ILM 或 SLM 策略。请在跨集群复制 限制 中了解更多信息。

先决条件

编辑

在完成本教程之前,请设置跨集群复制以连接两个集群并配置追随者索引。

在本教程中,kibana_sample_data_ecommerceclusterA 复制到 clusterB

resp = client.cluster.put_settings(
    persistent={
        "cluster": {
            "remote": {
                "clusterA": {
                    "mode": "proxy",
                    "skip_unavailable": "true",
                    "server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
                    "proxy_socket_connections": "18",
                    "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
                }
            }
        }
    },
)
print(resp)
response = client.cluster.put_settings(
  body: {
    persistent: {
      cluster: {
        remote: {
          "clusterA": {
            mode: 'proxy',
            skip_unavailable: 'true',
            server_name: 'clustera.es.region-a.gcp.elastic-cloud.com',
            proxy_socket_connections: '18',
            proxy_address: 'clustera.es.region-a.gcp.elastic-cloud.com:9400'
          }
        }
      }
    }
  }
)
puts response
const response = await client.cluster.putSettings({
  persistent: {
    cluster: {
      remote: {
        clusterA: {
          mode: "proxy",
          skip_unavailable: "true",
          server_name: "clustera.es.region-a.gcp.elastic-cloud.com",
          proxy_socket_connections: "18",
          proxy_address: "clustera.es.region-a.gcp.elastic-cloud.com:9400",
        },
      },
    },
  },
});
console.log(response);
### On clusterB ###
PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "clusterA": {
          "mode": "proxy",
          "skip_unavailable": "true",
          "server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
          "proxy_socket_connections": "18",
          "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
        }
      }
    }
  }
}
resp = client.ccr.follow(
    index="kibana_sample_data_ecommerce2",
    wait_for_active_shards="1",
    remote_cluster="clusterA",
    leader_index="kibana_sample_data_ecommerce",
)
print(resp)
const response = await client.ccr.follow({
  index: "kibana_sample_data_ecommerce2",
  wait_for_active_shards: 1,
  remote_cluster: "clusterA",
  leader_index: "kibana_sample_data_ecommerce",
});
console.log(response);
### On clusterB ###
PUT /kibana_sample_data_ecommerce2/_ccr/follow?wait_for_active_shards=1
{
  "remote_cluster": "clusterA",
  "leader_index": "kibana_sample_data_ecommerce"
}

写入(例如摄取或更新)应仅在领导者索引上发生。追随者索引是只读的,并将拒绝任何写入。

clusterA 宕机时的故障转移

编辑
  1. clusterB 中的追随者索引提升为常规索引,以便它们接受写入。这可以通过以下方式实现:

    • 首先,暂停追随者索引的索引跟随。
    • 接下来,关闭追随者索引。
    • 取消跟随领导者索引。
    • 最后,打开追随者索引(此时它是一个常规索引)。
    resp = client.ccr.pause_follow(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp)
    
    resp1 = client.indices.close(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp1)
    
    resp2 = client.ccr.unfollow(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp2)
    
    resp3 = client.indices.open(
        index="kibana_sample_data_ecommerce2",
    )
    print(resp3)
    response = client.ccr.pause_follow(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    
    response = client.indices.close(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    
    response = client.ccr.unfollow(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    
    response = client.indices.open(
      index: 'kibana_sample_data_ecommerce2'
    )
    puts response
    const response = await client.ccr.pauseFollow({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response);
    
    const response1 = await client.indices.close({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response1);
    
    const response2 = await client.ccr.unfollow({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response2);
    
    const response3 = await client.indices.open({
      index: "kibana_sample_data_ecommerce2",
    });
    console.log(response3);
    ### On clusterB ###
    POST /kibana_sample_data_ecommerce2/_ccr/pause_follow
    POST /kibana_sample_data_ecommerce2/_close
    POST /kibana_sample_data_ecommerce2/_ccr/unfollow
    POST /kibana_sample_data_ecommerce2/_open
  2. 在客户端(Logstash、Beats、Elastic Agent)端,手动重新启用 kibana_sample_data_ecommerce2 的摄取,并将流量重定向到 clusterB。您还应在此期间将所有搜索流量重定向到 clusterB 集群。您可以通过将文档摄取到此索引来模拟此操作。您应该注意到此索引现在是可写的。

    resp = client.index(
        index="kibana_sample_data_ecommerce2",
        document={
            "user": "kimchy"
        },
    )
    print(resp)
    response = client.index(
      index: 'kibana_sample_data_ecommerce2',
      body: {
        user: 'kimchy'
      }
    )
    puts response
    const response = await client.index({
      index: "kibana_sample_data_ecommerce2",
      document: {
        user: "kimchy",
      },
    });
    console.log(response);
    ### On clusterB ###
    POST kibana_sample_data_ecommerce2/_doc/
    {
      "user": "kimchy"
    }

clusterA 恢复时的故障恢复

编辑

clusterA 恢复时,clusterB 将成为新的领导者,而 clusterA 将成为追随者。

  1. clusterA 上设置远程集群 clusterB

    resp = client.cluster.put_settings(
        persistent={
            "cluster": {
                "remote": {
                    "clusterB": {
                        "mode": "proxy",
                        "skip_unavailable": "true",
                        "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
                        "proxy_socket_connections": "18",
                        "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
                    }
                }
            }
        },
    )
    print(resp)
    response = client.cluster.put_settings(
      body: {
        persistent: {
          cluster: {
            remote: {
              "clusterB": {
                mode: 'proxy',
                skip_unavailable: 'true',
                server_name: 'clusterb.es.region-b.gcp.elastic-cloud.com',
                proxy_socket_connections: '18',
                proxy_address: 'clusterb.es.region-b.gcp.elastic-cloud.com:9400'
              }
            }
          }
        }
      }
    )
    puts response
    const response = await client.cluster.putSettings({
      persistent: {
        cluster: {
          remote: {
            clusterB: {
              mode: "proxy",
              skip_unavailable: "true",
              server_name: "clusterb.es.region-b.gcp.elastic-cloud.com",
              proxy_socket_connections: "18",
              proxy_address: "clusterb.es.region-b.gcp.elastic-cloud.com:9400",
            },
          },
        },
      },
    });
    console.log(response);
    ### On clusterA ###
    PUT _cluster/settings
    {
      "persistent": {
        "cluster": {
          "remote": {
            "clusterB": {
              "mode": "proxy",
              "skip_unavailable": "true",
              "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
              "proxy_socket_connections": "18",
              "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
            }
          }
        }
      }
    }
  2. 在可以将任何索引转换为追随者之前,需要丢弃现有数据。在删除 clusterA 上的任何索引之前,请确保 clusterB 上提供了最新数据。

    resp = client.indices.delete(
        index="kibana_sample_data_ecommerce",
    )
    print(resp)
    response = client.indices.delete(
      index: 'kibana_sample_data_ecommerce'
    )
    puts response
    const response = await client.indices.delete({
      index: "kibana_sample_data_ecommerce",
    });
    console.log(response);
    ### On clusterA ###
    DELETE kibana_sample_data_ecommerce
  3. clusterA 上创建一个追随者索引,现在它跟随 clusterB 中的领导者索引。

    resp = client.ccr.follow(
        index="kibana_sample_data_ecommerce",
        wait_for_active_shards="1",
        remote_cluster="clusterB",
        leader_index="kibana_sample_data_ecommerce2",
    )
    print(resp)
    const response = await client.ccr.follow({
      index: "kibana_sample_data_ecommerce",
      wait_for_active_shards: 1,
      remote_cluster: "clusterB",
      leader_index: "kibana_sample_data_ecommerce2",
    });
    console.log(response);
    ### On clusterA ###
    PUT /kibana_sample_data_ecommerce/_ccr/follow?wait_for_active_shards=1
    {
      "remote_cluster": "clusterB",
      "leader_index": "kibana_sample_data_ecommerce2"
    }
  4. 追随者集群上的索引现在包含更新的文档。

    resp = client.search(
        index="kibana_sample_data_ecommerce",
        q="kimchy",
    )
    print(resp)
    response = client.search(
      index: 'kibana_sample_data_ecommerce',
      q: 'kimchy'
    )
    puts response
    const response = await client.search({
      index: "kibana_sample_data_ecommerce",
      q: "kimchy",
    });
    console.log(response);
    ### On clusterA ###
    GET kibana_sample_data_ecommerce/_search?q=kimchy

    如果软删除在可以复制到追随者之前被合并掉,则以下过程将因领导者上的历史记录不完整而失败,有关详细信息,请参阅 index.soft_deletes.retention_lease.period