-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add StepDownOnRemoval
#79
Conversation
91e9c62
to
47f80a3
Compare
This make me worried about lack of proper backward compatibility testing. We should not depend on CockroachDB to discover such issues. What would happened if CockroachDB missies a issue that impacts etcd. Etcd has much slower development cycle, results would be really bad. Could someone on raft side take an action item to propose a test plan for backward compatibility? |
This change is arguably in the gray zone where a bug fix breaks backwards compatibility -- the fact that CockroachDB relies on a learner to commit and replicate the final conf change is problematic. Even so, it's a bug that we currently rely on. The original PR included test changes for this change in behavior, so in some sense it was already covered by tests (and this PR increases that coverage). The change in behavior was considered beneficial, but in retrospect we should have recognized that any such change in behavior has potential to break something for someone, and should have gated it on a config option in the first place. |
Makes sense. |
47f80a3
to
eaec4b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thx
In 72d62a1, we made Raft leaders step down when they removed themselves from the group. However, this broke backwards compatibility. This patch makes the behavior opt-in via a new `StepDownOnRemoval` config option, with the intent to enable it unconditionally in the next major release. In CockroachDB, we campaign a designated voter when the leader is removed from the group. However, during a joint config change, the leader is first demoted to learner before removal. The step-down logic introduced in 72d62a1 triggers on learner demotion (which is the right behavior), but this left the CockroachDB group without a leader, having to wait out an election timeout. While it is straightforward to change this behavior in CockroachDB, we have to maintain backwards compatibility with older nodes during rolling upgrades. Signed-off-by: Erik Grinaker <[email protected]>
eaec4b0
to
ee0fe9d
Compare
Picks up - etcd-io/raft#77 - etcd-io/raft#79 (essentially an off-by-default for 77) - etcd-io/raft#82 - etcd-io/raft#81 (needed for cockroachdb#105797) Epic: none Release note: None
In 72d62a1, we made Raft leaders step down when they removed themselves from the group. However, this broke backwards compatibility. This patch makes the behavior opt-in via a new
StepDownOnRemoval
config option, with the intent to enable it unconditionally in the next major release.In CockroachDB, we campaign a designated voter when the leader is removed from the group. However, during a joint config change, the leader is first demoted to learner before removal. The step-down logic introduced in 72d62a1 triggers on learner demotion (which is the right behavior), but this left the CockroachDB group without a leader, having to wait out an election timeout. While it is straightforward to change this behavior in CockroachDB, we have to maintain backwards compatibility with older nodes during rolling upgrades.
Signed-off-by: Erik Grinaker [email protected]
cc @ahrtr @tbg @pavelkalinnikov
My bad for not testing this change properly in the first place. I'll write up an issue to enable by default in the next major release.