Extend the unicast based recovery algorithm to do replication policy check #11996
+26
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Extend the version vector/unicast based recovery algorithm to do the replication policy check while deciding whether a version can be recovered from the set of available log servers. This will make the algorithm compatible with the non-unicast/"main" algorithm while handling non-reporting log servers during recovery.
Test that exposed this issue:
build_output/bin/fdbserver -r simulation --crash -f /root/src/foundationdb/tests/slow/RyowCorrectness.toml -b off -s 29779152
A "getRange()" call was getting blocked because recovery was not completing, which was because "replication_factor" number of log servers were not reporting during recovery. But these set non-reporting log servers were not completing the replication policy, so extending the recovery algorithm to do the replication policy check allowed recovery to progress and the test to succeed.
Note that this extension will be able to make recovery progress only in cases where the non-reporting log servers won't meet the replication policy. But this will make the algorithm compatible with "main" while handling such scenarios.
Testing:
Id (with version vector disabled): 20250305-205711-sre-b53cba5eecb4dadb (started).
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)