ZOOKEEPER-4925: Fix data loss due to propagation of discontinuous committedLog #2254
+199
−76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are two variants of
ZooKeeperServer::processTxn
. Those two variants diverge significantly since ZOOKEEPER-3484.processTxn(Request request)
pops outstanding change fromoutstandingChanges
and adds txn tocommittedLog
for follower to sync in addition to whatprocessTxn(TxnHeader hdr, Record txn)
does. TheLearner
usesprocessTxn(TxnHeader hdr, Record txn)
to commit txn to memory after ZOOKEEPER-4394, which means it leavescommittedLog
untouched inSYNCHRONIZATION
phase.This way, a stale follower will have hole in its
committedLog
after joining cluster. The stale follower will propagate the in memory hole to other stale nodes after becoming leader. This causes data loss.The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 is affected.
This commit drops
processTxn(TxnHeader hdr, Record txn)
asprocessTxn(Request request)
is capable inSYNCHRONIZATION
phase too.Also, this commit rejects discontinuous proposals in
syncWithLeader
andcommittedLog
, so to avoid possible data loss.Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484