Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-17885: Enable clients to rebootstrap based on timeout or error code (KIP-1102) #17720

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

rajinisivaram
Copy link
Contributor

Implementation of https://cwiki.apache.org/confluence/display/KAFKA/KIP-1102%3A+Enable+clients+to+rebootstrap+based+on+timeout+or+error+code

  • Introduces rebootstrap trigger interval config metadata.recovery.rebootstrap.trigger.ms, set to 5 minutes by default
  • Makes rebootstrap the default for metadata.recovery.strategy
  • Adds new error code REBOOTSTRAP_REQUIRED, introduces top-level error code in metadata response. On this error, clients rebootstrap.
  • Configs apply to producers, consumers, share consumers, admin clients, Connect and KStreams clients.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Collaborator

@apoorvmittal10 apoorvmittal10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rajinisivaram. Just some questions for my understanding.

private void rebootstrap(long now) {
closeAll();
metadata.rebootstrap();
metadataAttemptStartMs = Optional.of(now);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we need to set the time while rebootstrapping or reset to Optional.empty? As whenever there would be successful response from metadata then the time will be updated?

Or is it more to handle that after re-bootstrap, if we never receive successful metadata response then we would like to rebootstrap again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to handle rebootstrapping again if successful metadata response is not received after one rebootstrap (the second case you mentioned).


if (!cluster.nodes().isEmpty()) {
this.cluster = cluster;
}
}

public void initiateRebootstrap() {
requestUpdate();
this.metadataAttemptStartMs = Optional.of(0L);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why don't we set the time to now as we do in NetworkClient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rebootstrap(), we set time to now as in NetworkClient. In both, when we receive REBOOTSTRAP_REQUIRED error code, we set to 0, so that rebootstrap is triggered on the next poll regardless of time.

logContext,
Long.MAX_VALUE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this constructor is used be Connect Worker which gets passed to ConsumerNetworkClient. So should we not support the consumer client spun in other dependant modules? Do you think the a default value of Long.MAX_VALUE for this config makes sense so we can just get the ocnfigured vs default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean the NetworkClient in WorkerGroupMember? This PR changes the constructor used in WorkerGroupMember to pass in the configured value. Let me know if I have missed a different one.

@rajinisivaram
Copy link
Contributor Author

@apoorvmittal10 Thanks for the review, I have responded to the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants