翻译文章,原文地址:https://redis.io/commands/cluster-failover/
Available since: 3.0.0
Time complexity: O(1)
ACL categories: @admin
@slow
@dangerous
This command, that can only be sent to a Redis Cluster replica node, forces the replica to start a manual failover of its master instance.
该命令仅能被发送到集群的slave从节点,强制该slave从节点开启手动故障转移。
A manual failover is a special kind of failover that is usually executed when there are no actual failures, but we wish to swap the current master with one of its replicas (which is the node we send the command to), in a safe way, without any window for data loss. It works in the following way:
手动故障转移是一种特殊的故障转移,通常在没有实际故障的情况下执行,我们希望将当前主节点与其中一个slave从节点(我们发送该命令的节点)交换(安全地,而不会有数据丢失的窗口)。它的工作步骤如下:
- The replica tells the master to stop processing queries from clients.
- The master replies to the replica with the current replication offset.
- The replica waits for the replication offset to match on its side, to make sure it processed all the data from the master before it continues.
- The replica starts a failover, obtains a new configuration epoch from the majority of the masters, and broadcasts the new configuration.
- The old master receives the configuration update: unblocks its clients and starts replying with redirection messages so that they’ll continue the chat with the new master.
- 当前slave从节点通知主节点停止处理客户端的请求。
- 主节点回复slave从节点当前的 *同步偏移量*。
- slave从节点等待同步偏移量在slave从节点的侧匹配,以确保它已经处理了所有主节点的数据,然后继续。
- slave从节点开始故障转移,从主节点的大多数主节点获取新的配置纪元值epoch,并广播新的配置。
- 旧的主节点接收配置更新:解除对客户端访问的阻止,并开始回复重定向消息,以便它们继续与新的主节点通信。
This way clients are moved away from the old master to the new master atomically and only when the replica that is turning into the new master has processed all of the replication stream from the old master.
只有当正在切换为新的主节点当前slave从节点处理了所有旧的主节点的同步(复制)流以后,客户端的访问才会自动从旧的主节点切换到新的主节点。
FORCE option: manual failover when the master is down
FORCE 选项:当主节点停止时手动故障转移
When the master is down, the replica will automatically start a manual failover.
当主节点停止时,slave从节点会自动开启手动故障转移。
The command behavior can be modified by two options: FORCE and TAKEOVER.
该命令有两个选项:FORCE 和 TAKEOVER。
If the FORCE option is given, the replica does not perform any handshake with the master, that may be not reachable, but instead just starts a failover ASAP starting from point 4. This is useful when we want to start a manual failover while the master is no longer reachable.
如果选择FORCE选项,slave从节点不会与master主节点进行协商(master节点可能不可达),而是直接尽快从上文的故障转移步骤中的第4步开始做故障转移。当主节点不可达时,FORCE选项对于我们做手动故障转移非常有用。
However using FORCE we still need the majority of masters to be available in order to authorize the failover and generate a new configuration epoch for the replica that is going to become master.
但是使用FORCE选项,我们仍然需要大多数主节点可用,以便授权故障转移并为将成为新的主节点的slave从节点生成新的配置纪元值epoch(纪元)。
TAKEOVER option: manual failover without cluster consensus
TAKEOVER 选项: 忽略群集一致验证的的人工故障切换
There are situations where this is not enough, and we want a replica to failover without any agreement with the rest of the cluster. A real world use case for this is to mass promote replicas in a different data center to masters in order to perform a data center switch, while all the masters are down or partitioned away.
有些场景下,集群中的master节点不足,我们需要不经其他的集群节点协商一致就进行故障迁移。实际用途举例:集群中主节点和从节点在不同的数据中心,当所有主节点down掉或被网络分区隔离,需要用该参数将slave节点 批量切换为master节点。
The TAKEOVER option implies everything FORCE implies, but also does not uses any cluster authorization in order to failover. A replica receiving CLUSTER FAILOVER TAKEOVER
will instead:
TACKOVER选项实现了FORCE选项的所有实现,但是无需集群一致性验证来进行故障转移。 当slave从节点点接收到
CLUSTER FAILOVER TAKEOVER
命令时,它会:
- Generate a new
configEpoch
unilaterally, just taking the current greatest epoch available and incrementing it if its local configuration epoch is not already the greatest. - Assign itself all the hash slots of its master, and propagate the new configuration to every node which is reachable ASAP, and eventually to every other node.
- 生成一个新的配置纪元值epoch,不经过集群一致性验证,直接将当前集群的最大的配置纪元值epoch设置为本地配置的纪元值epoch,并且如果本地配置纪元值epoch仍不是集群最大的配置纪元值epoch,则将其加1。
- 将它对应的master节点的所有哈希槽位都分配给自己,并将新的配置纪元值epoch尽快传播到所有可达的节点,最终传播给所有的非当前节点(即所有的可达的节点和后续恢复的故障节点)。
Note that TAKEOVER violates the last-failover-wins principle of Redis Cluster, since the configuration epoch generated by the replica violates the normal generation of configuration epochs in several ways:
需要注意的是注意:TAKEOVER选项违反Redis群集最新-故障转移-有效 原则,因为slave节点产生的配置epoch会通过以下几种方式让正常产生的的配置epoch无效:
- There is no guarantee that it is actually the higher configuration epoch, since, for example, we can use the TAKEOVER option within a minority, nor any message exchange is performed to generate the new configuration epoch.
- If we generate a configuration epoch which happens to collide with another instance, eventually our configuration epoch, or the one of another instance with our same epoch, will be moved away using the configuration epoch collision resolution algorithm.
- 例如,我们可以在一个少数节点中使用TAKEOVER选项,而不会执行任何消息交换来生成新的配置纪元值epoch。这样就不能保证正常产生的配置纪元是最大的配置纪元了。
- 如果我们新生成的配置纪元和其他实例的配置纪元相同,最终两者会被通过使用配置纪元冲突解决算法舍弃掉一个。(当两个节点的配置纪元值相等时,nodeId更大的一方会递增全局配置纪元[currentEpoch]并赋值给当前节点来区分冲突)
Because of this the TAKEOVER option should be used with care.
因此,TAKEOVER选项应该谨慎使用。
Implementation details and notes
实现详细信息和注意事项
CLUSTER FAILOVER
, unless the TAKEOVER option is specified, does not execute a failover synchronously. It only schedules a manual failover, bypassing the failure detection stage. > 如果没有指定TAKEOVER选项,则CLUSTER FAILOVER
命令不会同步执行故障转移,它只跳过了故障检测阶段,添加了一个手动故障转移的任务。- An
OK
reply is no guarantee that the failover will succeed. > 即使命令执行后返回了OK
,也不能保证故障转移成功执行。 - A replica can only be promoted to a master if it is known as a replica by a majority of the masters in the cluster. If the replica is a new node that has just been added to the cluster (for example after upgrading it), it may not yet be known to all the masters in the cluster. To check that the masters are aware of a new replica, you can send
CLUSTER NODES
orCLUSTER REPLICAS
to each of the master nodes and check that it appears as a replica, before sendingCLUSTER FAILOVER
to the replica. > 只有集群中大部分的master节点感知到了一个新的slave节点,才能将其转换为master节点。新加入到集群的slave节点可能不会被所有的master节点感知到( for example after upgrading it(比如说刚升级完?升什么级?)。此时可以通过抽每个master节点发送CLUSTER NODES
命令或者CLUSTER REPLICAS
命令来检查是否被所有的master节点感知到了。然后再向slave从节点发送CLUSTER FAILOVER
命令。 To check that the failover has actually happened you can use
ROLE
,INFO REPLICATION
(which indicates “role:master” after successful failover), orCLUSTER NODES
to verify that the state of the cluster has changed sometime after the command was sent.检查故障转移是否真的发生了,可以使用
ROLE
命令,INFO REPLICATION
命令(在故障转移成功后,它会显示“role:master”)或者CLUSTER NODES
命令来检查集群的状态是否发生了变化。To check if the failover has failed, check the replica’s log for “Manual failover timed out”, which is logged if the replica has given up after a few seconds.
检查故障转移是否失败,可以通过检查slave从节点的日志中的“Manual failover timed out”输出,slave从节点如果放弃操作,会在几秒钟后向日志输入这个内容。
Return
返回值
Simple string reply: OK
if the command was accepted and a manual failover is going to be attempted. An error if the operation cannot be executed, for example if we are talking with a node which is already a master.
返回字符串:如果命令被接受,会返回
OK
并尝试执行一个手动故障转移。如果操作无法执行,比如执行命令的节点已经是一个master节点,则返回一个error错误。